-
Model Predictive Simulation Using Structured Graphical Models and Transformers
Authors:
Xinghua Lou,
Meet Dave,
Shrinu Kushagra,
Miguel Lazaro-Gredilla,
Kevin Murphy
Abstract:
We propose an approach to simulating trajectories of multiple interacting agents (road users) based on transformers and probabilistic graphical models (PGMs), and apply it to the Waymo SimAgents challenge. The transformer baseline is based on the MTR model, which predicts multiple future trajectories conditioned on the past trajectories and static road layout features. We then improve upon these g…
▽ More
We propose an approach to simulating trajectories of multiple interacting agents (road users) based on transformers and probabilistic graphical models (PGMs), and apply it to the Waymo SimAgents challenge. The transformer baseline is based on the MTR model, which predicts multiple future trajectories conditioned on the past trajectories and static road layout features. We then improve upon these generated trajectories using a PGM, which contains factors which encode prior knowledge, such as a preference for smooth trajectories, and avoidance of collisions with static obstacles and other moving agents. We perform (approximate) MAP inference in this PGM using the Gauss-Newton method. Finally we sample $K=32$ trajectories for each of the $N \sim 100$ agents for the next $T=8 Δ$ time steps, where $Δ=10$ is the sampling rate per second. Following the Model Predictive Control (MPC) paradigm, we only return the first element of our forecasted trajectories at each step, and then we replan, so that the simulation can constantly adapt to its changing environment. We therefore call our approach "Model Predictive Simulation" or MPS. We show that MPS improves upon the MTR baseline, especially in safety critical metrics such as collision rate. Furthermore, our approach is compatible with any underlying forecasting model, and does not require extra training, so we believe it is a valuable contribution to the community.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Second-Order Algorithms for Finding Local Nash Equilibria in Zero-Sum Games
Authors:
Kushagra Gupta,
Xinjie Liu,
Ufuk Topcu,
David Fridovich-Keil
Abstract:
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors.…
▽ More
Zero-sum games arise in a wide variety of problems, including robust optimization and adversarial learning. However, algorithms deployed for finding a local Nash equilibrium in these games often converge to non-Nash stationary points. This highlights a key challenge: for any algorithm, the stability properties of its underlying dynamical system can cause non-Nash points to be potential attractors. To overcome this challenge, algorithms must account for subtleties involving the curvatures of players' costs. To this end, we leverage dynamical system theory and develop a second-order algorithm for finding a local Nash equilibrium in the smooth, possibly nonconvex-nonconcave, zero-sum game setting. First, we prove that this novel method guarantees convergence to only local Nash equilibria with a local linear convergence rate. We then interpret a version of this method as a modified Gauss-Newton algorithm with local superlinear convergence to the neighborhood of a point that satisfies first-order local Nash equilibrium conditions. In comparison, current related state-of-the-art methods do not offer convergence rate guarantees. Furthermore, we show that this approach naturally generalizes to settings with convex and potentially coupled constraints while retaining earlier guarantees of convergence to only local (generalized) Nash equilibria.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Fast Samplers for Inverse Problems in Iterative Refinement Models
Authors:
Kushagra Pandey,
Ruihan Yang,
Stephan Mandt
Abstract:
Constructing fast samplers for unconditional diffusion and flow-matching models has received much attention recently; however, existing methods for solving inverse problems, such as super-resolution, inpainting, or deblurring, still require hundreds to thousands of iterative steps to obtain high-quality results. We propose a plug-and-play framework for constructing efficient samplers for inverse p…
▽ More
Constructing fast samplers for unconditional diffusion and flow-matching models has received much attention recently; however, existing methods for solving inverse problems, such as super-resolution, inpainting, or deblurring, still require hundreds to thousands of iterative steps to obtain high-quality results. We propose a plug-and-play framework for constructing efficient samplers for inverse problems, requiring only pre-trained diffusion or flow-matching models. We present Conditional Conjugate Integrators, which leverage the specific form of the inverse problem to project the respective conditional diffusion/flow dynamics into a more amenable space for sampling. Our method complements popular posterior approximation methods for solving inverse problems using diffusion/flow models. We evaluate the proposed method's performance on various linear image restoration tasks across multiple datasets, employing diffusion and flow-matching models. Notably, on challenging inverse problems like 4$\times$ super-resolution on the ImageNet dataset, our method can generate high-quality samples in as few as 5 conditional sampling steps and outperforms competing baselines requiring 20-1000 steps. Our code and models will be publicly available at https://github.com/mandt-lab/CI2RM.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
Authors:
Zaid Tasneem,
Akshat Dave,
Abhishek Singh,
Kushagra Tiwary,
Praneeth Vepakomma,
Ashok Veeraraghavan,
Ramesh Raskar
Abstract:
Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractab…
▽ More
Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractable. Our approach, DecentNeRF, is the first attempt at decentralized, crowd-sourced NeRFs that require $\sim 10^4\times$ less server computing for a scene than a centralized approach. Instead of sending the raw data, our approach requires users to send a 3D representation, distributing the high computation cost of training centralized NeRFs between the users. It learns photorealistic scene representations by decomposing users' 3D views into personal and global NeRFs and a novel optimally weighted aggregation of only the latter. We validate the advantage of our approach to learn NeRFs with photorealism and minimal server computation cost on structured synthetic and real-world photo tourism datasets. We further analyze how secure aggregation of global NeRFs in DecentNeRF minimizes the undesired reconstruction of personal content by the server.
△ Less
Submitted 28 March, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Regret Minimization via Saddle Point Optimization
Authors:
Johannes Kirschner,
Seyed Alireza Bakhtiari,
Kushagra Chandak,
Volodymyr Tkachuk,
Csaba Szepesvári
Abstract:
A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs. In the corresponding saddle-point game, the min-player optimizes the sampling distribution against an adversarial max-player that chooses confusing models leading to large regret. The most recent instantiation of this idea is the decision-estimation coefficient (DEC),…
▽ More
A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs. In the corresponding saddle-point game, the min-player optimizes the sampling distribution against an adversarial max-player that chooses confusing models leading to large regret. The most recent instantiation of this idea is the decision-estimation coefficient (DEC), which was shown to provide nearly tight lower and upper bounds on the worst-case expected regret in structured bandits and reinforcement learning. By re-parametrizing the offset DEC with the confidence radius and solving the corresponding min-max program, we derive an anytime variant of the Estimation-To-Decisions (E2D) algorithm. Importantly, the algorithm optimizes the exploration-exploitation trade-off online instead of via the analysis. Our formulation leads to a practical algorithm for finite model classes and linear feedback models. We further point out connections to the information ratio, decoupling coefficient and PAC-DEC, and numerically evaluate the performance of E2D on simple examples.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Link Prediction for Social Networks using Representation Learning and Heuristic-based Features
Authors:
Samarth Khanna,
Sree Bhattacharyya,
Sudipto Ghosh,
Kushagra Agarwal,
Asit Kumar Das
Abstract:
The exponential growth in scale and relevance of social networks enable them to provide expansive insights. Predicting missing links in social networks efficiently can help in various modern-day business applications ranging from generating recommendations to influence analysis. Several categories of solutions exist for the same. Here, we explore various feature extraction techniques to generate r…
▽ More
The exponential growth in scale and relevance of social networks enable them to provide expansive insights. Predicting missing links in social networks efficiently can help in various modern-day business applications ranging from generating recommendations to influence analysis. Several categories of solutions exist for the same. Here, we explore various feature extraction techniques to generate representations of nodes and edges in a social network that allow us to predict missing links. We compare the results of using ten feature extraction techniques categorized across Structural embeddings, Neighborhood-based embeddings, Graph Neural Networks, and Graph Heuristics, followed by modeling with ensemble classifiers and custom Neural Networks. Further, we propose combining heuristic-based features and learned representations that demonstrate improved performance for the link prediction task on social network datasets. Using this method to generate accurate recommendations for many applications is a matter of further study that appears very promising. The code for all the experiments has been made public.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
On the Challenges and Opportunities in Generative AI
Authors:
Laura Manduchi,
Kushagra Pandey,
Robert Bamler,
Ryan Cotterell,
Sina Däubener,
Sophie Fellenz,
Asja Fischer,
Thomas Gärtner,
Matthias Kirchler,
Marius Kloft,
Yingzhen Li,
Christoph Lippert,
Gerard de Melo,
Eric Nalisnick,
Björn Ommer,
Rajesh Ranganath,
Maja Rudolph,
Karen Ullrich,
Guy Van den Broeck,
Julia E Vogt,
Yixin Wang,
Florian Wenzel,
Frank Wood,
Stephan Mandt,
Vincent Fortuin
Abstract:
The field of deep generative modeling has grown rapidly and consistently over the years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue t…
▽ More
The field of deep generative modeling has grown rapidly and consistently over the years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models do not sufficiently address several fundamental issues that hinder their widespread adoption across domains. In this work, we aim to identify key unresolved challenges in modern generative AI paradigms that should be tackled to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with valuable insights for exploring fruitful research directions, thereby fostering the development of more robust and accessible generative AI solutions.
△ Less
Submitted 28 February, 2024;
originally announced March 2024.
-
Towards Fast Stochastic Sampling in Diffusion Generative Models
Authors:
Kushagra Pandey,
Maja Rudolph,
Stephan Mandt
Abstract:
Diffusion models suffer from slow sample generation at inference time. Despite recent efforts, improving the sampling efficiency of stochastic samplers for diffusion models remains a promising direction. We propose Splitting Integrators for fast stochastic sampling in pre-trained diffusion models in augmented spaces. Commonly used in molecular dynamics, splitting-based integrators attempt to impro…
▽ More
Diffusion models suffer from slow sample generation at inference time. Despite recent efforts, improving the sampling efficiency of stochastic samplers for diffusion models remains a promising direction. We propose Splitting Integrators for fast stochastic sampling in pre-trained diffusion models in augmented spaces. Commonly used in molecular dynamics, splitting-based integrators attempt to improve sampling efficiency by cleverly alternating between numerical updates involving the data, auxiliary, or noise variables. However, we show that a naive application of splitting integrators is sub-optimal for fast sampling. Consequently, we propose several principled modifications to naive splitting samplers for improving sampling efficiency and denote the resulting samplers as Reduced Splitting Integrators. In the context of Phase Space Langevin Diffusion (PSLD) [Pandey \& Mandt, 2023] on CIFAR-10, our stochastic sampler achieves an FID score of 2.36 in only 100 network function evaluations (NFE) as compared to 2.63 for the best baselines.
△ Less
Submitted 13 February, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
SUNDIAL: 3D Satellite Understanding through Direct, Ambient, and Complex Lighting Decomposition
Authors:
Nikhil Behari,
Akshat Dave,
Kushagra Tiwary,
William Yang,
Ramesh Raskar
Abstract:
3D modeling from satellite imagery is essential in areas of environmental science, urban planning, agriculture, and disaster response. However, traditional 3D modeling techniques face unique challenges in the remote sensing context, including limited multi-view baselines over extensive regions, varying direct, ambient, and complex illumination conditions, and time-varying scene changes across capt…
▽ More
3D modeling from satellite imagery is essential in areas of environmental science, urban planning, agriculture, and disaster response. However, traditional 3D modeling techniques face unique challenges in the remote sensing context, including limited multi-view baselines over extensive regions, varying direct, ambient, and complex illumination conditions, and time-varying scene changes across captures. In this work, we introduce SUNDIAL, a comprehensive approach to 3D reconstruction of satellite imagery using neural radiance fields. We jointly learn satellite scene geometry, illumination components, and sun direction in this single-model approach, and propose a secondary shadow ray casting technique to 1) improve scene geometry using oblique sun angles to render shadows, 2) enable physically-based disentanglement of scene albedo and illumination, and 3) determine the components of illumination from direct, ambient (sky), and complex sources. To achieve this, we incorporate lighting cues and geometric priors from remote sensing literature in a neural rendering approach, modeling physical properties of satellite scenes such as shadows, scattered sky illumination, and complex illumination and shading of vegetation and water. We evaluate the performance of SUNDIAL against existing NeRF-based techniques for satellite scene modeling and demonstrate improved scene and lighting disentanglement, novel view and lighting rendering, and geometry and sun direction estimation on challenging scenes with small baselines, sparse inputs, and variable illumination.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Efficient Integrators for Diffusion Generative Models
Authors:
Kushagra Pandey,
Maja Rudolph,
Stephan Mandt
Abstract:
Diffusion models suffer from slow sample generation at inference time. Therefore, develo** a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators genera…
▽ More
Diffusion models suffer from slow sample generation at inference time. Therefore, develo** a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, map** the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
DISeR: Designing Imaging Systems with Reinforcement Learning
Authors:
Tzofi Klinghoffer,
Kushagra Tiwary,
Nikhil Behari,
Bhavya Agrawalla,
Ramesh Raskar
Abstract:
Imaging systems consist of cameras to encode visual information about the world and perception models to interpret this encoding. Cameras contain (1) illumination sources, (2) optical elements, and (3) sensors, while perception models use (4) algorithms. Directly searching over all combinations of these four building blocks to design an imaging system is challenging due to the size of the search s…
▽ More
Imaging systems consist of cameras to encode visual information about the world and perception models to interpret this encoding. Cameras contain (1) illumination sources, (2) optical elements, and (3) sensors, while perception models use (4) algorithms. Directly searching over all combinations of these four building blocks to design an imaging system is challenging due to the size of the search space. Moreover, cameras and perception models are often designed independently, leading to sub-optimal task performance. In this paper, we formulate these four building blocks of imaging systems as a context-free grammar (CFG), which can be automatically searched over with a learned camera designer to jointly optimize the imaging system with task-specific perception models. By transforming the CFG to a state-action space, we then show how the camera designer can be implemented with reinforcement learning to intelligently search over the combinatorial space of possible imaging system configurations. We demonstrate our approach on two tasks, depth estimation and camera rig design for autonomous vehicles, showing that our method yields rigs that outperform industry-wide standards. We believe that our proposed approach is an important step towards automating imaging system design.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
Game-theoretic Occlusion-Aware Motion Planning: an Efficient Hybrid-Information Approach
Authors:
Kushagra Gupta,
David Fridovich-Keil
Abstract:
We present a novel algorithm for game-theoretic trajectory planning, tailored for settings in which agents can only observe one another in specific regions of the state space. Such problems arise naturally in the context of multi-robot navigation, where occlusions due to environment geometry naturally mask agents' view of one another. In this paper, we formalize these settings as dynamic games wit…
▽ More
We present a novel algorithm for game-theoretic trajectory planning, tailored for settings in which agents can only observe one another in specific regions of the state space. Such problems arise naturally in the context of multi-robot navigation, where occlusions due to environment geometry naturally mask agents' view of one another. In this paper, we formalize these settings as dynamic games with a hybrid information structure, which interleaves so-called "open-loop" periods (in which agents cannot observe one another) with "feedback" periods (with full state observability). We present two main contributions. First, we study a canonical variant of these hybrid information games in which agents' dynamics are linear, and objectives are convex and quadratic. Here, we build upon classical solution methods for the open-loop and feedback variants of these games to derive an algorithm for the hybrid information case that matches the cubic runtime of the classical settings. Second, we consider a far broader class of problems in which agents' dynamics are nonlinear, and objectives are nonquadratic; we reduce these problems to sequences of hybrid information linear-quadratic games and empirically demonstrate that iteratively solving these simpler problems with the proposed algorithm yields reliable convergence to approximate Nash equilibria through simulation studies of overtaking and intersection traffic scenarios.
△ Less
Submitted 16 June, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Practical First-Order Bayesian Optimization Algorithms
Authors:
Utkarsh Prakash,
Aryan Chollera,
Kushagra Khatwani,
Prabuchandran K. J.,
Tejas Bodas
Abstract:
First Order Bayesian Optimization (FOBO) is a sample efficient sequential approach to find the global maxima of an expensive-to-evaluate black-box objective function by suitably querying for the function and its gradient evaluations. Such methods assume Gaussian process (GP) models for both, the function and its gradient, and use them to construct an acquisition function that identifies the next q…
▽ More
First Order Bayesian Optimization (FOBO) is a sample efficient sequential approach to find the global maxima of an expensive-to-evaluate black-box objective function by suitably querying for the function and its gradient evaluations. Such methods assume Gaussian process (GP) models for both, the function and its gradient, and use them to construct an acquisition function that identifies the next query point. In this paper, we propose a class of practical FOBO algorithms that efficiently utilizes the information from the gradient GP to identify potential query points with zero gradients. We construct a multi-level acquisition function where in the first step, we optimize a lower level acquisition function with multiple restarts to identify potential query points with zero gradient value. We then use the upper level acquisition function to rank these query points based on their function values to potentially identify the global maxima. As a final step, the potential point of maxima is chosen as the actual query point. We validate the performance of our proposed algorithms on several test functions and show that our algorithms outperform state-of-the-art FOBO algorithms. We also illustrate the application of our algorithms in finding optimal set of hyper-parameters in machine learning and in learning the optimal policy in reinforcement learning tasks.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
A Comparative Analysis of Portfolio Optimization Using Mean-Variance, Hierarchical Risk Parity, and Reinforcement Learning Approaches on the Indian Stock Market
Authors:
Jaydip Sen,
Aditya Jaiswal,
Anshuman Pathak,
Atish Kumar Majee,
Kushagra Kumar,
Manas Kumar Sarkar,
Soubhik Maji
Abstract:
This paper presents a comparative analysis of the performances of three portfolio optimization approaches. Three approaches of portfolio optimization that are considered in this work are the mean-variance portfolio (MVP), hierarchical risk parity (HRP) portfolio, and reinforcement learning-based portfolio. The portfolios are trained and tested over several stock data and their performances are com…
▽ More
This paper presents a comparative analysis of the performances of three portfolio optimization approaches. Three approaches of portfolio optimization that are considered in this work are the mean-variance portfolio (MVP), hierarchical risk parity (HRP) portfolio, and reinforcement learning-based portfolio. The portfolios are trained and tested over several stock data and their performances are compared on their annual returns, annual risks, and Sharpe ratios. In the reinforcement learning-based portfolio design approach, the deep Q learning technique has been utilized. Due to the large number of possible states, the construction of the Q-table is done using a deep neural network. The historical prices of the 50 premier stocks from the Indian stock market, known as the NIFTY50 stocks, and several stocks from 10 important sectors of the Indian stock market are used to create the environment for training the agent.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Online fair division with arbitrary entitlements
Authors:
Kushagra Chatterjee,
Biswadeep Sen,
Yuhao Wang
Abstract:
The division of goods in the online realm poses opportunities and challenges. While innovative mechanisms can be developed, uncertainty about the future may hinder effective solutions. This project aims to explore fair distribution models for goods among agents with arbitrary entitlements, specifically addressing food charity challenges in the real world. Building upon prior work in [AAGW15], whic…
▽ More
The division of goods in the online realm poses opportunities and challenges. While innovative mechanisms can be developed, uncertainty about the future may hinder effective solutions. This project aims to explore fair distribution models for goods among agents with arbitrary entitlements, specifically addressing food charity challenges in the real world. Building upon prior work in [AAGW15], which focuses on equal entitlements, our project seeks to better understand the proofs of the theorems mentioned in that paper, which currently only provide proof sketches. Our approach employs different proof techniques from those presented in [AAGW15]
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
A Complete Recipe for Diffusion Generative Models
Authors:
Kushagra Pandey,
Stephan Mandt
Abstract:
Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating fo…
▽ More
Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.
△ Less
Submitted 11 October, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
Graph schemas as abstractions for transfer learning, inference, and planning
Authors:
J. Swaroop Guntupalli,
Rajkumar Vasudeva Raju,
Shrinu Kushagra,
Carter Wendelken,
Danny Sawyer,
Ishan Deshpande,
Guangyao Zhou,
Miguel Lázaro-Gredilla,
Dileep George
Abstract:
Transferring latent structure from one environment or problem to another is a mechanism by which humans and animals generalize with very little data. Inspired by cognitive and neurobiological insights, we propose graph schemas as a mechanism of abstraction for transfer learning. Graph schemas start with latent graph learning where perceptually aliased observations are disambiguated in the latent s…
▽ More
Transferring latent structure from one environment or problem to another is a mechanism by which humans and animals generalize with very little data. Inspired by cognitive and neurobiological insights, we propose graph schemas as a mechanism of abstraction for transfer learning. Graph schemas start with latent graph learning where perceptually aliased observations are disambiguated in the latent space using contextual information. Latent graph learning is also emerging as a new computational model of the hippocampus to explain map learning and transitive inference. Our insight is that a latent graph can be treated as a flexible template -- a schema -- that models concepts and behaviors, with slots that bind groups of latent nodes to the specific observations or groundings. By treating learned latent graphs (schemas) as prior knowledge, new environments can be quickly learned as compositions of schemas and their newly learned bindings. We evaluate graph schemas on two previously published challenging tasks: the memory & planning game and one-shot StreetLearn, which are designed to test rapid task solving in novel environments. Graph schemas can be learned in far fewer episodes than previous baselines, and can model and plan in a few steps in novel variations of these tasks. We also demonstrate learning, matching, and reusing graph schemas in more challenging 2D and 3D environments with extensive perceptual aliasing and size variations, and show how different schemas can be composed to model larger and more complex environments. To summarize, our main contribution is a unified system, inspired and grounded in cognitive science, that facilitates rapid transfer learning of new environments using schemas via map-induction and composition that handles perceptual aliasing.
△ Less
Submitted 12 December, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
ORCa: Glossy Objects as Radiance Field Cameras
Authors:
Kushagra Tiwary,
Akshat Dave,
Nikhil Behari,
Tzofi Klinghoffer,
Ashok Veeraraghavan,
Ramesh Raskar
Abstract:
Reflections on glossy objects contain valuable and hidden information about the surrounding environment. By converting these objects into cameras, we can unlock exciting applications, including imaging beyond the camera's field-of-view and from seemingly impossible vantage points, e.g. from reflections on the human eye. However, this task is challenging because reflections depend jointly on object…
▽ More
Reflections on glossy objects contain valuable and hidden information about the surrounding environment. By converting these objects into cameras, we can unlock exciting applications, including imaging beyond the camera's field-of-view and from seemingly impossible vantage points, e.g. from reflections on the human eye. However, this task is challenging because reflections depend jointly on object geometry, material properties, the 3D environment, and the observer viewing direction. Our approach converts glossy objects with unknown geometry into radiance-field cameras to image the world from the object's perspective. Our key insight is to convert the object surface into a virtual sensor that captures cast reflections as a 2D projection of the 5D environment radiance field visible to the object. We show that recovering the environment radiance fields enables depth and radiance estimation from the object to its surroundings in addition to beyond field-of-view novel-view synthesis, i.e. rendering of novel views that are only directly-visible to the glossy object present in the scene, but not the observer. Moreover, using the radiance field we can image around occluders caused by close-by objects in the scene. Our method is trained end-to-end on multi-view images of the object and jointly estimates object geometry, diffuse radiance, and the 5D environment radiance field.
△ Less
Submitted 12 December, 2022; v1 submitted 8 December, 2022;
originally announced December 2022.
-
A Machine Learning, Natural Language Processing Analysis of Youth Perspectives: Key Trends and Focus Areas for Sustainable Youth Development Policies
Authors:
Kushaagra Gupta
Abstract:
Investing in children and youth is a critical step towards inclusive, equitable, and sustainable development for current and future generations. Several international agendas for accomplishing common global goals emphasize the need for active youth participation and engagement for sustainable development. The 2030 Agenda for Sustainable Development emphasizes the need for youth engagement and the…
▽ More
Investing in children and youth is a critical step towards inclusive, equitable, and sustainable development for current and future generations. Several international agendas for accomplishing common global goals emphasize the need for active youth participation and engagement for sustainable development. The 2030 Agenda for Sustainable Development emphasizes the need for youth engagement and the inclusion of youth perspectives as an important step toward addressing each of the 17 Sustainable Development Goals. The aim of this study is to analyze youth perspectives, values, and sentiments towards issues addressed by the 17 Sustainable Development Goals through social network analysis using machine learning. Social network data collected during 7 major sustainability conferences aimed at engaging children and youth is analyzed using natural language processing techniques for sentiment analysis. This data categorized using a natural language processing text classifier trained on a sample dataset of social network data during the 7 youth sustainability conferences for deeper understanding of youth perspectives in relation to the SDGs. Machine learning identified demographic and location attributes and features are utilized in order to identify bias and demographic differences between ages, gender, and race among youth. Using natural language processing, the qualitative data collected from over 7 different countries in 3 languages are systematically translated, categorized, and analyzed, revealing key trends and focus areas for sustainable youth development policies. The obtained results reveal the general youth's depth of knowledge on sustainable development and their attitudes towards each of the 17 SDGs. The findings of this study serve as a guide toward better understanding the interests, roles, and perspectives of children and youth in achieving the goals of Agenda 2030.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
UAV-based Visual Remote Sensing for Automated Building Inspection
Authors:
Kushagra Srivastava,
Dhruv Patel,
Aditya Kumar Jha,
Mohhit Kumar Jha,
Jaskirat Singh,
Ravi Kiran Sarvadevabhatla,
Pradeep Kumar Ramancharla,
Harikumar Kandath,
K. Madhava Krishna
Abstract:
Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the co…
▽ More
Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the component's contribution to structural system performance. Most of these inspections are done manually, leading to high utilization of manpower, time, and cost. This paper proposes a methodology to automate these inspections through UAV-based image data collection and a software library for post-processing that helps in estimating the seismic structural parameters. The key parameters considered here are the distances between adjacent buildings, building plan-shape, building plan area, objects on the rooftop and rooftop layout. The accuracy of the proposed methodology in estimating the above-mentioned parameters is verified through field measurements taken using a distance measuring sensor and also from the data obtained through Google Earth. Additional details and code can be accessed from https://uvrsabi.github.io/ .
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Popular Edges with Critical Nodes
Authors:
Kushagra Chatterjee,
Prajakta Nimbhorkar
Abstract:
In the popular edge problem, the input is a bipartite graph $G = (A \cup B,E)$ where $A$ and $B$ denote a set of men and a set of women respectively, and each vertex in $A\cup B$ has a strict preference ordering over its neighbours. A matching $M$ in $G$ is said to be {\em popular} if there is no other matching $M'$ such that the number of vertices that prefer $M'$ to $M$ is more than the number o…
▽ More
In the popular edge problem, the input is a bipartite graph $G = (A \cup B,E)$ where $A$ and $B$ denote a set of men and a set of women respectively, and each vertex in $A\cup B$ has a strict preference ordering over its neighbours. A matching $M$ in $G$ is said to be {\em popular} if there is no other matching $M'$ such that the number of vertices that prefer $M'$ to $M$ is more than the number of vertices that prefer $M$ to $M'$. The goal is to determine, whether a given edge $e$ belongs to some popular matching in $G$. A polynomial-time algorithm for this problem appears in \cite{CK18}. We consider the popular edge problem when some men or women are prioritized or critical. A matching that matches all the critical nodes is termed as a feasible matching. It follows from \cite{Kavitha14,Kavitha21,NNRS21,NN17} that, when $G$ admits a feasible matching, there always exists a matching that is popular among all feasible matchings. We give a polynomial-time algorithm for the popular edge problem in the presence of critical men or women. We also show that an analogous result does not hold in the many-to-one setting, which is known as the Hospital-Residents Problem in literature, even when there are no critical nodes.
△ Less
Submitted 22 September, 2022;
originally announced September 2022.
-
Deep Learning-based ECG Classification on Raspberry PI using a Tensorflow Lite Model based on PTB-XL Dataset
Authors:
Kushagra Sharma,
Rasit Eskicioglu
Abstract:
The number of IoT devices in healthcare is expected to rise sharply due to increased demand since the COVID-19 pandemic. Deep learning and IoT devices are being employed to monitor body vitals and automate anomaly detection in clinical and non-clinical settings. Most of the current technology requires the transmission of raw data to a remote server, which is not efficient for resource-constrained…
▽ More
The number of IoT devices in healthcare is expected to rise sharply due to increased demand since the COVID-19 pandemic. Deep learning and IoT devices are being employed to monitor body vitals and automate anomaly detection in clinical and non-clinical settings. Most of the current technology requires the transmission of raw data to a remote server, which is not efficient for resource-constrained IoT devices and embedded systems. Additionally, it is challenging to develop a machine learning model for ECG classification due to the lack of an extensive open public database. To an extent, to overcome this challenge PTB-XL dataset has been used. In this work, we have developed machine learning models to be deployed on Raspberry Pi. We present an evaluation of our TensorFlow Model with two classification classes. We also present the evaluation of the corresponding TensorFlow Lite FlatBuffers to demonstrate their minimal run-time requirements while maintaining acceptable accuracy.
△ Less
Submitted 25 August, 2022;
originally announced September 2022.
-
Physics vs. Learned Priors: Rethinking Camera and Algorithm Design for Task-Specific Imaging
Authors:
Tzofi Klinghoffer,
Siddharth Somasundaram,
Kushagra Tiwary,
Ramesh Raskar
Abstract:
Cameras were originally designed using physics-based heuristics to capture aesthetic images. In recent years, there has been a transformation in camera design from being purely physics-driven to increasingly data-driven and task-specific. In this paper, we present a framework to understand the building blocks of this nascent field of end-to-end design of camera hardware and algorithms. As part of…
▽ More
Cameras were originally designed using physics-based heuristics to capture aesthetic images. In recent years, there has been a transformation in camera design from being purely physics-driven to increasingly data-driven and task-specific. In this paper, we present a framework to understand the building blocks of this nascent field of end-to-end design of camera hardware and algorithms. As part of this framework, we show how methods that exploit both physics and data have become prevalent in imaging and computer vision, underscoring a key trend that will continue to dominate the future of task-specific camera design. Finally, we share current barriers to progress in end-to-end design, and hypothesize how these barriers can be overcome.
△ Less
Submitted 11 January, 2023; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Physically Disentangled Representations
Authors:
Tzofi Klinghoffer,
Kushagra Tiwary,
Arkadiusz Balata,
Vivek Sharma,
Ramesh Raskar
Abstract:
State-of-the-art methods in generative representation learning yield semantic disentanglement, but typically do not consider physical scene parameters, such as geometry, albedo, lighting, or camera. We posit that inverse rendering, a way to reverse the rendering process to recover scene parameters from an image, can also be used to learn physically disentangled representations of scenes without su…
▽ More
State-of-the-art methods in generative representation learning yield semantic disentanglement, but typically do not consider physical scene parameters, such as geometry, albedo, lighting, or camera. We posit that inverse rendering, a way to reverse the rendering process to recover scene parameters from an image, can also be used to learn physically disentangled representations of scenes without supervision. In this paper, we show the utility of inverse rendering in learning representations that yield improved accuracy on downstream clustering, linear classification, and segmentation tasks with the help of our novel Leave-One-Out, Cycle Contrastive loss (LOOCC), which improves disentanglement of scene parameters and robustness to out-of-distribution lighting and viewpoints. We perform a comparison of our method with other generative representation learning methods across a variety of downstream tasks, including face attribute classification, emotion recognition, identification, face segmentation, and car classification. Our physically disentangled representations yield higher accuracy than semantically disentangled alternatives across all tasks and by as much as 18%. We hope that this work will motivate future research in applying advances in inverse rendering and 3D understanding to representation learning.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Towards Learning Neural Representations from Shadows
Authors:
Kushagra Tiwary,
Tzofi Klinghoffer,
Ramesh Raskar
Abstract:
We present a method that learns neural shadow fields which are neural scene representations that are only learnt from the shadows present in the scene. While traditional shape-from-shadow (SfS) algorithms reconstruct geometry from shadows, they assume a fixed scanning setup and fail to generalize to complex scenes. Neural rendering algorithms, on the other hand, rely on photometric consistency bet…
▽ More
We present a method that learns neural shadow fields which are neural scene representations that are only learnt from the shadows present in the scene. While traditional shape-from-shadow (SfS) algorithms reconstruct geometry from shadows, they assume a fixed scanning setup and fail to generalize to complex scenes. Neural rendering algorithms, on the other hand, rely on photometric consistency between RGB images, but largely ignore physical cues such as shadows, which have been shown to provide valuable information about the scene. We observe that shadows are a powerful cue that can constrain neural scene representations to learn SfS, and even outperform NeRF to reconstruct otherwise hidden geometry. We propose a graphics-inspired differentiable approach to render accurate shadows with volumetric rendering, predicting a shadow map that can be compared to the ground truth shadow. Even with just binary shadow maps, we show that neural rendering can localize the object and estimate coarse geometry. Our approach reveals that sparse cues in images can be used to estimate geometry using differentiable volumetric rendering. Moreover, our framework is highly generalizable and can work alongside existing 3D reconstruction techniques that otherwise only use photometric consistency.
△ Less
Submitted 19 July, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
PGMax: Factor Graphs for Discrete Probabilistic Graphical Models and Loopy Belief Propagation in JAX
Authors:
Guangyao Zhou,
Antoine Dedieu,
Nishanth Kumar,
Wolfgang Lehrach,
Miguel Lázaro-Gredilla,
Shrinu Kushagra,
Dileep George
Abstract:
PGMax is an open-source Python package for (a) easily specifying discrete Probabilistic Graphical Models (PGMs) as factor graphs; and (b) automatically running efficient and scalable loopy belief propagation (LBP) in JAX. PGMax supports general factor graphs with tractable factors, and leverages modern accelerators like GPUs for inference. Compared with existing alternatives, PGMax obtains higher-…
▽ More
PGMax is an open-source Python package for (a) easily specifying discrete Probabilistic Graphical Models (PGMs) as factor graphs; and (b) automatically running efficient and scalable loopy belief propagation (LBP) in JAX. PGMax supports general factor graphs with tractable factors, and leverages modern accelerators like GPUs for inference. Compared with existing alternatives, PGMax obtains higher-quality inference results with up to three orders-of-magnitude inference time speedups. PGMax additionally interacts seamlessly with the rapidly growing JAX ecosystem, opening up new research possibilities. Our source code, examples and documentation are available at https://github.com/deepmind/PGMax.
△ Less
Submitted 24 March, 2023; v1 submitted 8 February, 2022;
originally announced February 2022.
-
DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents
Authors:
Kushagra Pandey,
Avideep Mukherjee,
Piyush Rai,
Abhishek Kumar
Abstract:
Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, standard Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. We present DiffuseVAE, a novel ge…
▽ More
Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, standard Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design novel conditional parameterizations for diffusion models. We show that the resulting model equips diffusion models with a low-dimensional VAE inferred latent code which can be used for downstream tasks like controllable synthesis. The proposed method also improves upon the speed vs quality tradeoff exhibited in standard unconditional DDPM/DDIM models (for instance, FID of 16.47 vs 34.36 using a standard DDIM on the CelebA-HQ-128 benchmark using T=10 reverse process steps) without having explicitly trained for such an objective. Furthermore, the proposed model exhibits synthesis quality comparable to state-of-the-art models on standard image synthesis benchmarks like CIFAR-10 and CelebA-64 while outperforming most existing VAE-based methods. Lastly, we show that the proposed method exhibits inherent generalization to different types of noise in the conditioning signal. For reproducibility, our source code is publicly available at https://github.com/kpandey008/DiffuseVAE.
△ Less
Submitted 29 November, 2022; v1 submitted 2 January, 2022;
originally announced January 2022.
-
Pairwise Reachability Oracles and Preservers under Failures
Authors:
Diptarka Chakraborty,
Kushagra Chatterjee,
Keerti Choudhary
Abstract:
In this paper, we consider reachability oracles and reachability preservers for directed graphs/networks prone to edge/node failures. Let $G = (V, E)$ be a directed graph on $n$-nodes, and $P\subseteq V\times V$ be a set of vertex pairs in $G$. We present the first non-trivial constructions of single and dual fault-tolerant pairwise reachability oracle with constant query time. Furthermore, we pro…
▽ More
In this paper, we consider reachability oracles and reachability preservers for directed graphs/networks prone to edge/node failures. Let $G = (V, E)$ be a directed graph on $n$-nodes, and $P\subseteq V\times V$ be a set of vertex pairs in $G$. We present the first non-trivial constructions of single and dual fault-tolerant pairwise reachability oracle with constant query time. Furthermore, we provide extremal bounds for sparse fault-tolerant reachability preservers, resilient to two or more failures. Prior to this work, such oracles and reachability preservers were widely studied for the special scenario of single-source and all-pairs settings. However, for the scenario of arbitrary pairs, no prior (non-trivial) results were known for dual (or more) failures, except those implied from the single-source setting. One of the main questions is whether it is possible to beat the $O(n |P|)$ size bound (derived from the single-source setting) for reachability oracle and preserver for dual failures (or $O(2^k n|P|)$ bound for $k$ failures). We answer this question affirmatively.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Packing Squares into a Disk with Optimal Worst-Case Density
Authors:
Sándor P. Fekete,
Vijaykrishna Gurunathan,
Kushagra Juneja,
Phillip Keldenich,
Linda Kleist,
Christian Scheffer
Abstract:
We provide a tight result for a fundamental problem arising from packing squares into a circular container: The critical density of packing squares into a disk is $δ=\frac{8}{5π}\approx 0.509$. This implies that any set of (not necessarily equal) squares of total area $A \leq \frac{8}{5}$ can always be packed into a disk with radius 1; in contrast, for any $\varepsilon>0$ there are sets of squares…
▽ More
We provide a tight result for a fundamental problem arising from packing squares into a circular container: The critical density of packing squares into a disk is $δ=\frac{8}{5π}\approx 0.509$. This implies that any set of (not necessarily equal) squares of total area $A \leq \frac{8}{5}$ can always be packed into a disk with radius 1; in contrast, for any $\varepsilon>0$ there are sets of squares of total area $\frac{8}{5}+\varepsilon$ that cannot be packed, even if squares may be rotated. This settles the last (and arguably, most elusive) case of packing circular or square objects into a circular or square container: The critical densities for squares in a square $\left(\frac{1}{2}\right)$, circles in a square $\left(\fracπ{(3+2\sqrt{2})}\approx 0.539\right)$ and circles in a circle $\left(\frac{1}{2}\right)$ have already been established, making use of recursive subdivisions of a square container into pieces bounded by straight lines, or the ability to use recursive arguments based on similarity of objects and container; neither of these approaches can be applied when packing squares into a circular container. Our proof uses a careful manual analysis, complemented by a computer-assisted part that is based on interval arithmetic. Beyond the basic mathematical importance, our result is also useful as a blackbox lemma for the analysis of recursive packing algorithms. At the same time, our approach showcases the power of a general framework for computer-assisted proofs, based on interval arithmetic.
△ Less
Submitted 29 March, 2022; v1 submitted 12 March, 2021;
originally announced March 2021.
-
Classifying CELESTE as NP Complete
Authors:
Zeeshan Ahmed,
Alapan Chaudhuri,
Kunwar Shaanjeet Singh Grover,
Ashwin Rao,
Kushagra Garg,
Pulak Malhotra
Abstract:
We analyze the computational complexity of the video game "CELESTE" and prove that solving a generalized level in it is NP-Complete. Further, we also show how, upon introducing a small change in the game mechanics (adding a new game entity), we can make it PSPACE-complete.
We analyze the computational complexity of the video game "CELESTE" and prove that solving a generalized level in it is NP-Complete. Further, we also show how, upon introducing a small change in the game mechanics (adding a new game entity), we can make it PSPACE-complete.
△ Less
Submitted 1 December, 2022; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Combining Propositional Logic Based Decision Diagrams with Decision Making in Urban Systems
Authors:
Jia**g Ling,
Kushagra Chandak,
Akshat Kumar
Abstract:
Solving multiagent problems can be an uphill task due to uncertainty in the environment, partial observability, and scalability of the problem at hand. Especially in an urban setting, there are more challenges since we also need to maintain safety for all users while minimizing congestion of the agents as well as their travel times. To this end, we tackle the problem of multiagent pathfinding unde…
▽ More
Solving multiagent problems can be an uphill task due to uncertainty in the environment, partial observability, and scalability of the problem at hand. Especially in an urban setting, there are more challenges since we also need to maintain safety for all users while minimizing congestion of the agents as well as their travel times. To this end, we tackle the problem of multiagent pathfinding under uncertainty and partial observability where the agents are tasked to move from their starting points to ending points while also satisfying some constraints, e.g., low congestion, and model it as a multiagent reinforcement learning problem. We compile the domain constraints using propositional logic and integrate them with the RL algorithms to enable fast simulation for RL.
△ Less
Submitted 10 November, 2020; v1 submitted 9 November, 2020;
originally announced November 2020.
-
Is Q-Learning Provably Efficient? An Extended Analysis
Authors:
Kushagra Rastogi,
Jonathan Lee,
Fabrice Harel-Canada,
Aditya Joglekar
Abstract:
This work extends the analysis of the theoretical results presented within the paper Is Q-Learning Provably Efficient? by ** et al. We include a survey of related research to contextualize the need for strengthening the theoretical guarantees related to perhaps the most important threads of model-free reinforcement learning. We also expound upon the reasoning used in the proofs to highlight the c…
▽ More
This work extends the analysis of the theoretical results presented within the paper Is Q-Learning Provably Efficient? by ** et al. We include a survey of related research to contextualize the need for strengthening the theoretical guarantees related to perhaps the most important threads of model-free reinforcement learning. We also expound upon the reasoning used in the proofs to highlight the critical steps leading to the main result showing that Q-learning with UCB exploration achieves a sample efficiency that matches the optimal regret that can be achieved by any model-based approach.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
On sampling from data with duplicate records
Authors:
Alireza Heidari,
Shrinu Kushagra,
Ihab F. Ilyas
Abstract:
Data deduplication is the task of detecting records in a database that correspond to the same real-world entity. Our goal is to develop a procedure that samples uniformly from the set of entities present in the database in the presence of duplicates. We accomplish this by a two-stage process. In the first step, we estimate the frequencies of all the entities in the database. In the second step, we…
▽ More
Data deduplication is the task of detecting records in a database that correspond to the same real-world entity. Our goal is to develop a procedure that samples uniformly from the set of entities present in the database in the presence of duplicates. We accomplish this by a two-stage process. In the first step, we estimate the frequencies of all the entities in the database. In the second step, we use rejection sampling to obtain a (approximately) uniform sample from the set of entities. However, efficiently estimating the frequency of all the entities is a non-trivial task and not attainable in the general case. Hence, we consider various natural properties of the data under which such frequency estimation (and consequently uniform sampling) is possible. Under each of those assumptions, we provide sampling algorithms and give proofs of the complexity (both statistical and computational) of our approach. We complement our study by conducting extensive experiments on both real and synthetic datasets.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
Explainable AI based Interventions for Pre-season Decision Making in Fashion Retail
Authors:
Shravan Sajja,
Nupur Aggarwal,
Sumanta Mukherjee,
Kushagra Manglik,
Satyam Dwivedi,
Vikas Raykar
Abstract:
Future of sustainable fashion lies in adoption of AI for a better understanding of consumer shop** behaviour and using this understanding to further optimize product design, development and sourcing to finally reduce the probability of overproducing inventory. Explainability and interpretability are highly effective in increasing the adoption of AI based tools in creative domains like fashion. I…
▽ More
Future of sustainable fashion lies in adoption of AI for a better understanding of consumer shop** behaviour and using this understanding to further optimize product design, development and sourcing to finally reduce the probability of overproducing inventory. Explainability and interpretability are highly effective in increasing the adoption of AI based tools in creative domains like fashion. In a fashion house, stakeholders like buyers, merchandisers and financial planners have a more quantitative approach towards decision making with primary goals of high sales and reduced dead inventory. Whereas, designers have a more intuitive approach based on observing market trends, social media and runways shows. Our goal is to build an explainable new product forecasting tool with capabilities of interventional analysis such that all the stakeholders (with competing goals) can participate in collaborative decision making process of new product design, development and launch.
△ Less
Submitted 27 July, 2020;
originally announced August 2020.
-
Hyper-local sustainable assortment planning
Authors:
Nupur Aggarwal,
Abhishek Bansal,
Kushagra Manglik,
Kedar Kulkarni,
Vikas Raykar
Abstract:
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimi…
▽ More
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimization approach, that yields a Pareto-front of optimal assortments for merchandisers to choose from. Using the proposed approach on a few product categories of a leading fashion retailer shows that choosing assortments with lower environmental impact with a minimal impact on revenue is possible.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Record fusion: A learning approach
Authors:
Alireza Heidari,
George Michalopoulos,
Shrinu Kushagra,
Ihab F. Ilyas,
Theodoros Rekatsinas
Abstract:
Record fusion is the task of aggregating multiple records that correspond to the same real-world entity in a database. We can view record fusion as a machine learning problem where the goal is to predict the "correct" value for each attribute for each entity. Given a database, we use a combination of attribute-level, recordlevel, and database-level signals to construct a feature vector for each ce…
▽ More
Record fusion is the task of aggregating multiple records that correspond to the same real-world entity in a database. We can view record fusion as a machine learning problem where the goal is to predict the "correct" value for each attribute for each entity. Given a database, we use a combination of attribute-level, recordlevel, and database-level signals to construct a feature vector for each cell (or (row, col)) of that database. We use this feature vector alongwith the ground-truth information to learn a classifier for each of the attributes of the database.
Our learning algorithm uses a novel stagewise additive model. At each stage, we construct a new feature vector by combining a part of the original feature vector with features computed by the predictions from the previous stage. We then learn a softmax classifier over the new feature space. This greedy stagewise approach can be viewed as a deep model where at each stage, we are adding more complicated non-linear transformations of the original feature vector. We show that our approach fuses records with an average precision of ~98% when source information of records is available, and ~94% without source information across a diverse array of real-world datasets. We compare our approach to a comprehensive collection of data fusion and entity consolidation methods considered in the literature. We show that our approach can achieve an average precision improvement of ~20%/~45% with/without source information respectively.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Adjoined Networks: A Training Paradigm with Applications to Network Compression
Authors:
Utkarsh Nath,
Shrinu Kushagra,
Yingzhen Yang
Abstract:
Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In…
▽ More
Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.
△ Less
Submitted 14 April, 2022; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Discriminatory Price Mechanism for Smart Grid
Authors:
Diptangshu Sen,
Kushaagra Goyal,
Varun Ramamohan,
Arnob Ghosh
Abstract:
We consider a scenario where a retailer can set different prices for different consumers in a smart grid. The retailer's objective is to maximize the revenue, minimize the operating cost, and maximize the consumer's welfare. The retailer wants to optimize a convex combination of the above objectives using price signals specific to each consumer. However, variability in unit prices across consumers…
▽ More
We consider a scenario where a retailer can set different prices for different consumers in a smart grid. The retailer's objective is to maximize the revenue, minimize the operating cost, and maximize the consumer's welfare. The retailer wants to optimize a convex combination of the above objectives using price signals specific to each consumer. However, variability in unit prices across consumers is bounded by a parameter $η$, hence limiting the discrimination. We formulate the pricing problem as a Stackelberg game where the retailer is the leader and consumers are followers. Since the retailer's optimization problem turns out to be non-convex, we convexify it via relaxations. We provide performance guarantees for the relaxations in the asymptotic sense (when number of consumers tends to $\infty$). Further, we show that despite the variability in pricing, the pricing scheme proposed by our model is fair as higher prices are charged to consumers who have higher willingness for demand. We extend our analysis to the scenario where consumers can feed energy back to the grid via net-metering. We show that our pricing policy promotes fairness even in this scenario as prosumers who contribute more to the grid, are given large cuts on buying rates. The policy is also found to incentivize more prosumers to invest in renewable energy, thus encouraging sustainability.
△ Less
Submitted 8 November, 2021; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Three-dimensional matching is NP-Hard
Authors:
Shrinu Kushagra
Abstract:
The standard proof of NP-Hardness of 3DM provides a power-$4$ reduction of 3SAT to 3DM. In this note, we provide a linear-time reduction. Under the exponential time hypothesis, this reduction improves the runtime lower bound from $2^{o(\sqrt[4]{m})}$ (under the standard reduction) to $2^{o(m)}$.
The standard proof of NP-Hardness of 3DM provides a power-$4$ reduction of 3SAT to 3DM. In this note, we provide a linear-time reduction. Under the exponential time hypothesis, this reduction improves the runtime lower bound from $2^{o(\sqrt[4]{m})}$ (under the standard reduction) to $2^{o(m)}$.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.
-
Smart Summarizer for Blind People
Authors:
Mona teja K,
Mohan Sai. S,
H S S S Raviteja D,
Sai Kushagra P V
Abstract:
In today's world, time is a very important resource. In our busy lives, most of us hardly have time to read the complete news so what we have to do is just go through the headlines and satisfy ourselves with that. As a result, we might miss a part of the news or misinterpret the complete thing. The situation is even worse for the people who are visually impaired or have lost their ability to see.…
▽ More
In today's world, time is a very important resource. In our busy lives, most of us hardly have time to read the complete news so what we have to do is just go through the headlines and satisfy ourselves with that. As a result, we might miss a part of the news or misinterpret the complete thing. The situation is even worse for the people who are visually impaired or have lost their ability to see. The inability of these people to read text has a huge impact on their lives. There are a number of methods for blind people to read the text. Braille script, in particular, is one of the examples, but it is a highly inefficient method as it is really time taking and requires a lot of practice. So, we present a method for visually impaired people based on the sense of sound which is obviously better and more accurate than the sense of touch. This paper deals with an efficient method to summarize news into important keywords so as to save the efforts to go through the complete text every single time. This paper deals with many API's and modules like the tesseract, GTTS, and many algorithms that have been discussed and implemented in detail such as Luhn's Algorithm, Latent Semantic Analysis Algorithm, Text Ranking Algorithm. And the other functionality that this paper deals with is converting the summarized text to speech so that the system can aid even the blind people.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
How India Censors the Web
Authors:
Kushagra Singh,
Gurshabad Grover,
Varun Bansal
Abstract:
One of the primary ways in which India engages in online censorship is by ordering Internet Service Providers (ISPs) operating in its jurisdiction to block access to certain websites for its users. This paper reports the different techniques Indian ISPs are using to censor websites, and investigates whether website blocklists are consistent across ISPs. We propose a suite of tests that prove more…
▽ More
One of the primary ways in which India engages in online censorship is by ordering Internet Service Providers (ISPs) operating in its jurisdiction to block access to certain websites for its users. This paper reports the different techniques Indian ISPs are using to censor websites, and investigates whether website blocklists are consistent across ISPs. We propose a suite of tests that prove more robust than previous work in detecting DNS and HTTP based censorship. Our tests also discern the use of SNI inspection for blocking websites, which is previously undocumented in the Indian context. Using information from court orders, user reports, and public and leaked government orders, we compile the largest known list of potentially blocked websites in India. We pass this list to our tests and run them from connections of six different ISPs, which together serve more than 98% of Internet users in India. Our findings not only confirm that ISPs are using different techniques to block websites, but also demonstrate that different ISPs are not blocking the same websites.
△ Less
Submitted 30 May, 2020; v1 submitted 18 December, 2019;
originally announced December 2019.
-
Character Keypoint-based Homography Estimation in Scanned Documents for Efficient Information Extraction
Authors:
Kushagra Mahajan,
Monika Sharma,
Lovekesh Vig
Abstract:
Precise homography estimation between multiple images is a pre-requisite for many computer vision applications. One application that is particularly relevant in today's digital era is the alignment of scanned or camera-captured document images such as insurance claim forms for information extraction. Traditional learning based approaches perform poorly due to the absence of an appropriate gradient…
▽ More
Precise homography estimation between multiple images is a pre-requisite for many computer vision applications. One application that is particularly relevant in today's digital era is the alignment of scanned or camera-captured document images such as insurance claim forms for information extraction. Traditional learning based approaches perform poorly due to the absence of an appropriate gradient. Feature based keypoint extraction techniques for homography estimation in real scene images either detect an extremely large number of inconsistent keypoints due to sharp textual edges, or produce inaccurate keypoint correspondences due to variations in illumination and viewpoint differences between document images. In this paper, we propose a novel algorithm for aligning scanned or camera-captured document images using character based keypoints and a reference template. The algorithm is both fast and accurate and utilizes a standard Optical character recognition (OCR) engine such as Tesseract to find character based unambiguous keypoints, which are utilized to identify precise keypoint correspondences between two images. Finally, the keypoints are used to compute the homography map** between a test document and a template. We evaluated the proposed approach for information extraction on two real world anonymized datasets comprised of health insurance claim forms and the results support the viability of the proposed technique.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Multi-task Learning for Target-dependent Sentiment Classification
Authors:
Divam Gupta,
Kushagra Singh,
Soumen Chakrabarti,
Tanmoy Chakraborty
Abstract:
Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that i…
▽ More
Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that is informed by feature representation learnt for the related auxiliary task of passage-level sentiment classification. The auxiliary task uses a gated recurrent unit (GRU) and pools GRU states, followed by an auxiliary fully-connected layer that outputs passage-level predictions. In the main task, these GRUs contribute auxiliary per-token representations over and above word embeddings. The main task has its own, separate GRUs. The auxiliary and main GRUs send their states to a different fully connected layer, trained for the main task. Extensive experiments using two auxiliary datasets and three benchmark datasets (of which one is new, introduced by us) for the main task demonstrate that MTTDSC outperforms state-of-the-art baselines. Using word-level sensitivity analysis, we present anecdotal evidence that prior systems can make incorrect target-specific predictions because they miss sentiments expressed by words independent of target.
△ Less
Submitted 7 February, 2019;
originally announced February 2019.
-
Intent Detection and Slots Prompt in a Closed-Domain Chatbot
Authors:
Amber Nigam,
Prashik Sahare,
Kushagra Pandya
Abstract:
In this paper, we introduce a methodology for predicting intent and slots of a query for a chatbot that answers career-related queries. We take a multi-staged approach where both the processes (intent-classification and slot-tagging) inform each other's decision-making in different stages. The model breaks down the problem into stages, solving one problem at a time and passing on relevant results…
▽ More
In this paper, we introduce a methodology for predicting intent and slots of a query for a chatbot that answers career-related queries. We take a multi-staged approach where both the processes (intent-classification and slot-tagging) inform each other's decision-making in different stages. The model breaks down the problem into stages, solving one problem at a time and passing on relevant results of the current stage to the next, thereby reducing search space for subsequent stages, and eventually making classification and tagging more viable after each stage. We also observe that relaxing rules for a fuzzy entity-matching in slot-tagging after each stage (by maintaining a separate Named Entity Tagger per stage) helps us improve performance, although at a slight cost of false-positives. Our model has achieved state-of-the-art performance with F1-score of 77.63% for intent-classification and 82.24% for slot-tagging on our dataset that we would publicly release along with the paper.
△ Less
Submitted 10 January, 2019; v1 submitted 27 December, 2018;
originally announced December 2018.
-
Semi-supervised clustering for de-duplication
Authors:
Shrinu Kushagra,
Shai Ben-David,
Ihab Ilyas
Abstract:
Data de-duplication is the task of detecting multiple records that correspond to the same real-world entity in a database. In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters.
We introduce a framework whic…
▽ More
Data de-duplication is the task of detecting multiple records that correspond to the same real-world entity in a database. In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters.
We introduce a framework which we call promise correlation clustering. Given a complete graph $G$ with the edges labelled $0$ and $1$, the goal is to find a clustering that minimizes the number of $0$ edges within a cluster plus the number of $1$ edges across different clusters (or correlation loss). The optimal clustering can also be viewed as a complete graph $G^*$ with edges corresponding to points in the same cluster being labelled $0$ and other edges being labelled $1$. Under the promise that the edge difference between $G$ and $G^*$ is "small", we prove that finding the optimal clustering (or $G^*$) is still NP-Hard. [Ashtiani et. al, 2016] introduced the framework of semi-supervised clustering, where the learning algorithm has access to an oracle, which answers whether two points belong to the same or different clusters. We further prove that even with access to a same-cluster oracle, the promise version is NP-Hard as long as the number queries to the oracle is not too large ($o(n)$ where $n$ is the number of vertices).
Given these negative results, we consider a restricted version of correlation clustering. As before, the goal is to find a clustering that minimizes the correlation loss. However, we restrict ourselves to a given class $\mathcal F$ of clusterings. We offer a semi-supervised algorithmic approach to solve the restricted variant with success guarantees.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
The Follower Count Fallacy: Detecting Twitter Users with Manipulated Follower Count
Authors:
Anupama Aggarwal,
Saravana Kumar,
Kushagra Bhargava,
Ponnurangam Kumaraguru
Abstract:
Online Social Networks (OSN) are increasingly being used as platform for an effective communication, to engage with other users, and to create a social worth via number of likes, followers and shares. Such metrics and crowd-sourced ratings give the OSN user a sense of social reputation which she tries to maintain and boost to be more influential. Users artificially bolster their social reputation…
▽ More
Online Social Networks (OSN) are increasingly being used as platform for an effective communication, to engage with other users, and to create a social worth via number of likes, followers and shares. Such metrics and crowd-sourced ratings give the OSN user a sense of social reputation which she tries to maintain and boost to be more influential. Users artificially bolster their social reputation via black-market web services. In this work, we identify users which manipulate their projected follower count using an unsupervised local neighborhood detection method. We identify a neighborhood of the user based on a robust set of features which reflect user similarity in terms of the expected follower count. We show that follower count estimation using our method has 84.2% accuracy with a low error rate. In addition, we estimate the follower count of the user under suspicion by finding its neighborhood drawn from a large random sample of Twitter. We show that our method is highly tolerant to synthetic manipulation of followers. Using the deviation of predicted follower count from the displayed count, we are also able to detect customers with a high precision of 98.62%
△ Less
Submitted 10 February, 2018;
originally announced February 2018.
-
Virtual Sensor Modelling using Neural Networks with Coefficient-based Adaptive Weights and Biases Search Algorithm for Diesel Engines
Authors:
Kushagra Rastogi,
Navreet Saini
Abstract:
With the explosion in the field of Big Data and introduction of more stringent emission norms every three to five years, automotive companies must not only continue to enhance the fuel economy ratings of their products, but also provide valued services to their customers such as delivering engine performance and health reports at regular intervals. A reasonable solution to both issues is installin…
▽ More
With the explosion in the field of Big Data and introduction of more stringent emission norms every three to five years, automotive companies must not only continue to enhance the fuel economy ratings of their products, but also provide valued services to their customers such as delivering engine performance and health reports at regular intervals. A reasonable solution to both issues is installing a variety of sensors on the engine. Sensor data can be used to develop fuel economy features and will directly indicate engine performance. However, mounting a plethora of sensors is impractical in a very cost-sensitive industry. Thus, virtual sensors can replace physical sensors by reducing cost while capturing essential engine data.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Provably noise-robust, regularised $k$-means clustering
Authors:
Shrinu Kushagra,
Yaoliang Yu,
Shai Ben-David
Abstract:
We consider the problem of clustering in the presence of noise. That is, when on top of cluster structure, the data also contains a subset of \emph{unstructured} points. Our goal is to detect the clusters despite the presence of many unstructured points. Any algorithm that achieves this goal is noise-robust. We consider a regularisation method which converts any center-based clustering objective i…
▽ More
We consider the problem of clustering in the presence of noise. That is, when on top of cluster structure, the data also contains a subset of \emph{unstructured} points. Our goal is to detect the clusters despite the presence of many unstructured points. Any algorithm that achieves this goal is noise-robust. We consider a regularisation method which converts any center-based clustering objective into a noise-robust one. We focus on the $k$-means objective and we prove that the regularised version of $k$-means is NP-Hard even for $k=1$. We consider two algorithms based on the convex (sdp and lp) relaxation of the regularised objective and prove robustness guarantees for both.
The sdp and lp relaxation of the standard (non-regularised) $k$-means objective has been previously studied by [ABC+15]. Under the stochastic ball model of the data they show that the sdp-based algorithm recovers the underlying structure as long as the balls are separated by $δ> 2\sqrt{2} + ε$. We improve upon this result in two ways. First, we show recovery even for $δ> 2 + ε$. Second, our regularised algorithm recovers the balls even in the presence of noise so long as the number of noisy points is not too large. We complement our theoretical analysis with simulations and analyse the effect of various parameters like regularization constant, noise-level etc. on the performance of our algorithm. In the presence of noise, our algorithm performs better than $k$-means++ on MNIST.
△ Less
Submitted 27 August, 2018; v1 submitted 30 November, 2017;
originally announced November 2017.
-
Significance of Side Information in the Graph Matching Problem
Authors:
Kushagra Singhal,
Daniel Cullina,
Negar Kiyavash
Abstract:
Percolation based graph matching algorithms rely on the availability of seed vertex pairs as side information to efficiently match users across networks. Although such algorithms work well in practice, there are other types of side information available which are potentially useful to an attacker. In this paper, we consider the problem of matching two correlated graphs when an attacker has access…
▽ More
Percolation based graph matching algorithms rely on the availability of seed vertex pairs as side information to efficiently match users across networks. Although such algorithms work well in practice, there are other types of side information available which are potentially useful to an attacker. In this paper, we consider the problem of matching two correlated graphs when an attacker has access to side information, either in the form of community labels or an imperfect initial matching. In the former case, we propose a naive graph matching algorithm by introducing the community degree vectors which harness the information from community labels in an efficient manner. Furthermore, we analyze a variant of the basic percolation algorithm proposed in literature for graphs with community structure. In the latter case, we propose a novel percolation algorithm with two thresholds which uses an imperfect matching as input to match correlated graphs.
We evaluate the proposed algorithms on synthetic as well as real world datasets using various experiments. The experimental results demonstrate the importance of communities as side information especially when the number of seeds is small and the networks are weakly correlated.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
Clustering with Same-Cluster Queries
Authors:
Hassan Ashtiani,
Shrinu Kushagra,
Shai Ben-David
Abstract:
We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We sh…
▽ More
We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of $k$-means clustering (i.e., when the expert conforms to a solution of $k$-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems.
In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks $O\big(k^2\log k + k\log n)$ same-cluster queries and runs with time complexity $O\big(kn\log n)$ (where $k$ is the number of clusters and $n$ is the number of instances). The algorithm succeeds with high probability for data satisfying margin conditions under which, without queries, we show that the problem is NP hard. We also prove a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting.
△ Less
Submitted 22 November, 2016; v1 submitted 8 June, 2016;
originally announced June 2016.