-
Crafting Parts for Expressive Object Composition
Authors:
Harsh Rangwani,
Aishwarya Agarwal,
Kuldeep Kulkarni,
R. Venkatesh Babu,
Srikrishna Karanam
Abstract:
Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt…
▽ More
Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt either leads to an entirely different image (e.g., missing/incorrect identity) or the extra part details simply being ignored. To mitigate these issues, we introduce PartCraft, which enables image generation based on fine-grained part-level details specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartCraft first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right object region. After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions and combine them to produce the final image. All the stages of PartCraft are based on repurposing a pre-trained diffusion model, which enables it to generalize across various domains without training. We demonstrate the effectiveness of part-level control provided by PartCraft qualitatively through visual examples and quantitatively in comparison to the contemporary baselines.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Adaptive Incentive Design with Learning Agents
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Manxi Wu,
Shankar Sastry
Abstract:
How can the system operator learn an incentive mechanism that achieves social optimality based on limited information about the agents' behavior, who are dynamically updating their strategies? To answer this question, we propose an \emph{adaptive} incentive mechanism. This mechanism updates the incentives of agents based on the feedback of each agent's externality, evaluated as the difference betw…
▽ More
How can the system operator learn an incentive mechanism that achieves social optimality based on limited information about the agents' behavior, who are dynamically updating their strategies? To answer this question, we propose an \emph{adaptive} incentive mechanism. This mechanism updates the incentives of agents based on the feedback of each agent's externality, evaluated as the difference between the player's marginal cost and society's marginal cost at each time step. The proposed mechanism updates the incentives on a slower timescale compared to the agents' learning dynamics, resulting in a two-timescale coupled dynamical system. Notably, this mechanism is agnostic to the specific learning dynamics used by agents to update their strategies. We show that any fixed point of this adaptive incentive mechanism corresponds to the optimal incentive mechanism, ensuring that the Nash equilibrium coincides with the socially optimal strategy. Additionally, we provide sufficient conditions that guarantee the convergence of the adaptive incentive mechanism to a fixed point. Our results apply to both atomic and non-atomic games. To demonstrate the effectiveness of our proposed mechanism, we verify the convergence conditions in two practically relevant games: atomic networked quadratic aggregative games and non-atomic network routing games.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Capabilities of Gemini Models in Medicine
Authors:
Khaled Saab,
Tao Tu,
Wei-Hung Weng,
Ryutaro Tanno,
David Stutz,
Ellery Wulczyn,
Fan Zhang,
Tim Strother,
Chunjong Park,
Elahe Vedadi,
Juanma Zambrano Chaves,
Szu-Yeu Hu,
Mike Schaekermann,
Aishwarya Kamath,
Yong Cheng,
David G. T. Barrett,
Cathy Cheung,
Basil Mustafa,
Anil Palepu,
Daniel McDuff,
Le Hou,
Tomer Golany,
Luyang Liu,
Jean-baptiste Alayrac,
Neil Houlsby
, et al. (42 additional authors not shown)
Abstract:
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G…
▽ More
Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.
△ Less
Submitted 1 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Understanding the Impact of Coalitions between EV Charging Stations
Authors:
Sukanya Kudva,
Kshitij Kulkarni,
Chinmay Maheshwari,
Anil Aswani,
Shankar Sastry
Abstract:
The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. This expansion, however, places significant charging demand on the electricity grid, impacting grid operations and electricity pricing. While coordination among all charging stations is beneficial, it may not be always feasible. However, a subset of charging stations, which could be jointly op…
▽ More
The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. This expansion, however, places significant charging demand on the electricity grid, impacting grid operations and electricity pricing. While coordination among all charging stations is beneficial, it may not be always feasible. However, a subset of charging stations, which could be jointly operated by a company, could coordinate to decide their charging profile. In this paper we investigate whether such coalitions between charging stations is better than no coordination.
We model EV charging as a non-cooperative aggregative game, where each station's cost is determined by both monetary payments tied to reactive electricity prices on the grid and its sensitivity to deviations from a nominal charging profile. We consider a solution concept that we call $\mathcal{C}$-Nash equilibrium, which is tied to a coalition $\mathcal{C}$ of charging stations coordinating to reduce their cumulative costs. We provide sufficient conditions, in terms of the demand and sensitivity of charging stations, to determine when independent (uncoordinated) operation of charging stations could result in lower overall costs to charging stations, the coalition, and charging stations outside the coalition. Somewhat counter to intuition, we demonstrate scenarios where allowing charging stations to operate independently is better than coordinating as a coalition. Jointly, these results provide operators of charging stations insights into how to coordinate their charging behavior, and open several research directions.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
An Analysis of Intent-Based Markets
Authors:
Tarun Chitra,
Kshitij Kulkarni,
Mallesh Pai,
Theo Diamandis
Abstract:
Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse user experience compared to their centralized counterparts. Recently, off-chain marketplaces, colloquially called `intent markets,' have been proposed as a solution to these problems. In these markets, agents called \emph{solvers} compete to satisfy…
▽ More
Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse user experience compared to their centralized counterparts. Recently, off-chain marketplaces, colloquially called `intent markets,' have been proposed as a solution to these problems. In these markets, agents called \emph{solvers} compete to satisfy user orders, which may include complicated user-specified conditions. We provide two formal models of solvers' strategic behavior: one probabilistic and another deterministic. In our first model, solvers initially pay upfront costs to enter a Dutch auction to fill the user's order and then exert congestive, costly effort to search for prices for the user. Our results show that the costs incurred by solvers result in restricted entry in the market. Further, in the presence of costly effort and congestion, our results counter-intuitively show that a planner who aims to maximize user welfare may actually prefer to restrict entry, resulting in limited oligopoly. We then introduce an alternative, optimization-based deterministic model which corroborates these results. We conclude with extensions of our model to other auctions within blockchains and non-cryptocurrency applications, such as the US SEC's Proposal 615.
△ Less
Submitted 6 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Druv Pai,
Jiarui Yang,
Manxi Wu,
Shankar Sastry
Abstract:
Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. In this study, we address this concern by proposing a new class of congestion pricing schemes that not only minimize congestion levels but also incorporate an equity objective to reduce cost disparitie…
▽ More
Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. In this study, we address this concern by proposing a new class of congestion pricing schemes that not only minimize congestion levels but also incorporate an equity objective to reduce cost disparities among travelers with different willingness-to-pay. Our analysis builds on a congestion game model with heterogeneous traveler populations. We present four pricing schemes that account for practical considerations, such as the ability to charge differentiated tolls to various traveler populations and the option to toll all or only a subset of edges in the network. We evaluate our pricing schemes in the calibrated freeway network of the San Francisco Bay Area. We demonstrate that the proposed congestion pricing schemes improve both efficiency (in terms of reduced average travel time) and equity (the disparities of travel costs experienced by different populations) compared to the current pricing scheme. Moreover, our pricing schemes also generate a total revenue comparable to the current pricing scheme. Our results further show that pricing schemes charging differentiated prices to traveler populations with varying willingness-to-pay lead to a more equitable distribution of travel costs compared to those that charge a homogeneous price to all.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Towards Conversational Diagnostic AI
Authors:
Tao Tu,
Anil Palepu,
Mike Schaekermann,
Khaled Saab,
Jan Freyberg,
Ryutaro Tanno,
Amy Wang,
Brenna Li,
Mohamed Amin,
Nenad Tomasev,
Shekoofeh Azizi,
Karan Singhal,
Yong Cheng,
Le Hou,
Albert Webson,
Kavita Kulkarni,
S Sara Mahdavi,
Christopher Semturs,
Juraj Gottweis,
Joelle Barral,
Katherine Chou,
Greg S Corrado,
Yossi Matias,
Alan Karthikesalingam,
Vivek Natarajan
Abstract:
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introdu…
▽ More
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Towards Accurate Differential Diagnosis with Large Language Models
Authors:
Daniel McDuff,
Mike Schaekermann,
Tao Tu,
Anil Palepu,
Amy Wang,
Jake Garrison,
Karan Singhal,
Yash Sharma,
Shekoofeh Azizi,
Kavita Kulkarni,
Le Hou,
Yong Cheng,
Yun Liu,
S Sara Mahdavi,
Sushant Prakash,
Anupam Pathak,
Christopher Semturs,
Shwetak Patel,
Dale R Webster,
Ewa Dominowska,
Juraj Gottweis,
Joelle Barral,
Katherine Chou,
Greg S Corrado,
Yossi Matias
, et al. (3 additional authors not shown)
Abstract:
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op…
▽ More
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
The Specter (and Spectra) of Miner Extractable Value
Authors:
Guillermo Angeris,
Tarun Chitra,
Theo Diamandis,
Kshitij Kulkarni
Abstract:
Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering of transactions. In this work, we introduce a simple theoretical definition of the 'cost of MEV', prove some basic properties, and show that the definition is useful via a number of examples. In a variety of settings, this definition is related to the 'smoothness' of a fun…
▽ More
Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering of transactions. In this work, we introduce a simple theoretical definition of the 'cost of MEV', prove some basic properties, and show that the definition is useful via a number of examples. In a variety of settings, this definition is related to the 'smoothness' of a function over the symmetric group. From this definition and some basic observations, we recover a number of results from the literature.
△ Less
Submitted 12 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Attacks on Dynamic DeFi Interest Rate Curves
Authors:
Tarun Chitra,
Peteris Erins,
Kshitij Kulkarni
Abstract:
As decentralized money market protocols continue to grow in value locked, there have been a number of optimizations proposed for improving capital efficiency. One set of proposals from Euler Finance and Mars Protocol is to have an interest rate curve that is a proportional-integral-derivative (PID) controller. In this paper, we demonstrate attacks on proportional and proportional-integral controll…
▽ More
As decentralized money market protocols continue to grow in value locked, there have been a number of optimizations proposed for improving capital efficiency. One set of proposals from Euler Finance and Mars Protocol is to have an interest rate curve that is a proportional-integral-derivative (PID) controller. In this paper, we demonstrate attacks on proportional and proportional-integral controlled interest rate curves. The attack allows one to manipulate the interest rate curve to take a higher proportion of the earned yield than their pro-rata share of the lending pool. We conclude with an argument that PID interest rate curves can actually \emph{reduce} capital efficiency (due to attack mitigations) unless supply and demand elasticity to rate changes are sufficiently high.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images
Authors:
Hugo Bertiche,
Niloy J. Mitra,
Kuldeep Kulkarni,
Chun-Hao Paul Huang,
Tuanfeng Y. Wang,
Meysam Madadi,
Sergio Escalera,
Duygu Ceylan
Abstract:
Cinemagraphs are short loo** videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We inv…
▽ More
Cinemagraphs are short loo** videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces loo** cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
Authors:
Shikha Baghel,
Shreyas Ramoji,
Sidharth,
Ranjana H,
Prachi Singh,
Somil Jain,
Pratik Roy Chowdhuri,
Kaustubh Kulkarni,
Swapnil Padhi,
Deepu Vijayasenan,
Sriram Ganapathy
Abstract:
In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed s…
▽ More
In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions.
△ Less
Submitted 5 June, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Table Tennis Stroke Detection and Recognition Using Ball Trajectory Data
Authors:
Kaustubh Milind Kulkarni,
Rohan S Jamadagni,
Jeffrey Aaron Paul,
Sucheth Shenoy
Abstract:
In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a tempora…
▽ More
In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a temporal heatmap based model, have been implemented on our dataset and their performances have been benchmarked. A mathematical approach developed to extract temporal boundaries of strokes using the ball trajectory data yielded a total of 2023 valid strokes in our dataset, while also detecting services and missed strokes successfully. The temporal convolutional network developed performed stroke recognition on completely unseen data with an accuracy of 87.155%. Several machine learning and deep learning based model architectures have been trained for stroke recognition using ball trajectory input and benchmarked based on their performances. While stroke recognition in the field of table tennis has been extensively explored based on human action recognition using video data focused on the player's actions, the use of ball trajectory data for the same is an unexplored characteristic of the sport. Hence, the motivation behind the work is to demonstrate that meaningful inferences such as stroke detection and recognition can be drawn using minimal input information.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Self-supervised Multi-view Disentanglement for Expansion of Visual Collections
Authors:
Nihal Jain,
Praneetha Vaddamanu,
Paridhi Maheshwari,
Vishwa Vinay,
Kuldeep Kulkarni
Abstract:
Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a s…
▽ More
Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a specific view. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views. To this end, we propose a self-supervised learning method for extracting disentangled view-specific representations for images such that the inter-view overlap is minimized. We show how this allows us to compute the intent of a collection as a distribution over views. We show how effective retrieval can be performed by prioritizing candidate expansion images that match the intent of a query collection. Finally, we present a new querying mechanism for image search enabled by composing multiple collections and perform retrieval under this setting using the techniques presented in this paper.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
Credible, Optimal Auctions via Blockchains
Authors:
Tarun Chitra,
Matheus V. X. Ferreira,
Kshitij Kulkarni
Abstract:
Akbarpour and Li (2020) formalized credibility as an auction desideratum where the auctioneer cannot benefit by implementing undetectable deviations from the promised auction and showed that, in the plain model, the ascending price auction with reserves is the only credible, strategyproof, revenue-optimal auction. Ferreira and Weinberg (2020) proposed the Deferred Revelation Auction (DRA) as a com…
▽ More
Akbarpour and Li (2020) formalized credibility as an auction desideratum where the auctioneer cannot benefit by implementing undetectable deviations from the promised auction and showed that, in the plain model, the ascending price auction with reserves is the only credible, strategyproof, revenue-optimal auction. Ferreira and Weinberg (2020) proposed the Deferred Revelation Auction (DRA) as a communication efficient auction that avoids the uniqueness results from Akbarpour and Li (2020) assuming the existence of cryptographic commitments and as long as bidder valuations are MHR. They also showed DRA is not credible in settings where bidder valuations are $α$-strongly regular unless $α> 1$. In this paper, we ask if blockchains allow us to design a larger class of credible auctions. We answer this question positively, by showing that DRA is credible even for $α$-strongly regular distributions for all $α> 0$ if implemented over a secure and censorship-resistant blockchain. We argue ledgers provide two properties that limit deviations from a self-interested auctioneer. First, the existence of smart contracts allows one to extend the concept of credibility to settings where the auctioneer does not have a reputation -- one of the main limitations for the definition of credibility from Akbarpour and Li (2020). Second, blockchains allow us to implement mechanisms over a public broadcast channel, removing the adaptive undetectable deviations driving the negative results of Ferreira and Weinberg (2020).
△ Less
Submitted 29 January, 2023;
originally announced January 2023.
-
Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test
Authors:
Chih-Yuan Chiu,
Kshitij Kulkarni,
Shankar Sastry
Abstract:
Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probabilit…
▽ More
Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available.
△ Less
Submitted 17 July, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
EraseNet: A Recurrent Residual Network for Supervised Document Cleaning
Authors:
Yashowardhan Shinde,
Kishore Kulkarni,
Sachin Kuberkar
Abstract:
Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This pape…
▽ More
Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently.
△ Less
Submitted 4 July, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Towards a Theory of Maximal Extractable Value I: Constant Function Market Makers
Authors:
Kshitij Kulkarni,
Theo Diamandis,
Tarun Chitra
Abstract:
Maximal Extractable Value (MEV) refers to excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users' transactions to maximize fees or from inserting new transactions that front-run users' transactions. One of the most common types of MEV involves a `sandwich attack' against a user trading on a constant function marke…
▽ More
Maximal Extractable Value (MEV) refers to excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users' transactions to maximize fees or from inserting new transactions that front-run users' transactions. One of the most common types of MEV involves a `sandwich attack' against a user trading on a constant function market maker (CFMM), which is a popular class of automated market maker. We analyze game theoretic properties of MEV in CFMMs that we call \textit{routing} and \textit{reordering} MEV. In the case of routing, we present examples where the existence of MEV both degrades and, counterintuitively, \emph{improves} the quality of routing. We construct an analogue of the price of anarchy for this setting and demonstrate that if the impact of a sandwich attack is localized in a suitable sense, then the price of anarchy is constant. In the case of reordering, we show conditions when the maximum price impact caused by the reordering of sandwich attacks in a sequence of trades, relative to the average price, impact is $O(\log n)$ in the number of user trades. Combined, our results suggest methods that both MEV searchers and CFMM designers can utilize for estimating costs and profits of MEV.
△ Less
Submitted 30 April, 2023; v1 submitted 24 July, 2022;
originally announced July 2022.
-
GEMS: Scene Expansion using Generative Models of Graphs
Authors:
Rishi Agarwal,
Tirupati Saketh Chandra,
Vaidehi Patil,
Aniruddha Mahapatra,
Kuldeep Kulkarni,
Vishwa Vinay
Abstract:
Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by a…
▽ More
Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene graph expansion as a sequential prediction task involving multiple steps of first predicting a new node and then predicting the set of relationships between the newly predicted node and previous nodes in the graph. We propose a sequencing strategy for observed graphs that retains the clustering patterns amongst nodes. In addition, we leverage external knowledge to train our graph generation model, enabling greater generalization of node predictions. Due to the inefficiency of existing maximum mean discrepancy (MMD) based metrics for graph generation problems in evaluating predicted relationships between nodes (objects), we design novel metrics that comprehensively evaluate different aspects of predicted relations. We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs using the standard MMD-based metrics and our proposed metrics. We observe that the graphs generated by our method, GEMS, better represent the real distribution of the scene graphs than the baseline methods like GraphRNN.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Inducing Social Optimality in Games via Adaptive Incentive Design
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Manxi Wu,
Shankar Sastry
Abstract:
How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social pl…
▽ More
How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social planner updates the incentive mechanism at a slower timescale. In particular, the update of the incentive mechanism is based on each player's externality, which is evaluated as the difference between the player's marginal cost and the society's marginal cost in each time step. We show that any fixed point of our learning dynamics corresponds to the optimal incentive mechanism such that the corresponding Nash equilibrium also achieves social optimality. We also provide sufficient conditions for the learning dynamics to converge to a fixed point so that the adaptive incentive mechanism eventually induces a socially optimal outcome. Finally, we demonstrate that the sufficient conditions for convergence are satisfied in a variety of games, including (i) atomic networked quadratic aggregative games, (ii) atomic Cournot competition, and (iii) non-atomic network routing games.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Controllable Animation of Fluid Elements in Still Images
Authors:
Aniruddha Mahapatra,
Kuldeep Kulkarni
Abstract:
We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. Specifically, we focus on the animation of fluid elements like water, smoke, fire, which have the properties of repeating textures and continuous fluid motion. Taking inspiration from prior works, we represent the motion of such fluid elements in the image in the form of a constan…
▽ More
We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. Specifically, we focus on the animation of fluid elements like water, smoke, fire, which have the properties of repeating textures and continuous fluid motion. Taking inspiration from prior works, we represent the motion of such fluid elements in the image in the form of a constant 2D optical flow map. To this end, we allow the user to provide any number of arrow directions and their associated speeds along with a mask of the regions the user wants to animate. The user-provided input arrow directions, their corresponding speed values, and the mask are then converted into a dense flow map representing a constant optical flow map (FD). We observe that FD, obtained using simple exponential operations can closely approximate the plausible motion of elements in the image. We further refine computed dense optical flow map FD using a generative-adversarial network (GAN) to obtain a more realistic flow map. We devise a novel UNet based architecture to autoregressively generate future frames using the refined optical flow map by forward-war** the input image features at different resolutions. We conduct extensive experiments on a publicly available dataset and show that our method is superior to the baselines in terms of qualitative and quantitative metrics. In addition, we show the qualitative animations of the objects in directions that did not exist in the training set and provide a way to synthesize videos that otherwise would not exist in the real world.
△ Less
Submitted 25 September, 2023; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Dynamic Tolling for Inducing Socially Optimal Traffic Loads
Authors:
Chinmay Maheshwari,
Kshitij Kulkarni,
Manxi Wu,
Shankar Sastry
Abstract:
How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of…
▽ More
How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of loads and tolls in our dynamics have three key features: (i) The total demand of incoming and outgoing travelers is stochastically realized; (ii) Travelers are myopic and selfish in that they choose routes according to a perturbed best response given the current latency and tolls on parallel links; (iii) The update of tolls is at a slower timescale as compared to the the update of loads. We show that the loads and the tolls eventually concentrate in a neighborhood of the fixed point, which corresponds to the socially optimal load and toll price. Moreover, the fixed point load is also a stochastic user equilibrium with respect to the toll price. Our results are useful for traffic authorities to efficiently manage traffic loads in response to the arrival and departure of travelers.
△ Less
Submitted 17 October, 2021;
originally announced October 2021.
-
SemIE: Semantically-aware Image Extrapolation
Authors:
Bholeshwar Khurana,
Soumya Ranjan Dash,
Abhishek Bhatia,
Aniruddha Mahapatra,
Hrituraj Singh,
Kuldeep Kulkarni
Abstract:
We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the ex…
▽ More
We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the extended region based on the context. To this end, for a given image, we first obtain an object segmentation map using a state-of-the-art semantic segmentation method. The, thus, obtained segmentation map is fed into a network to compute the extrapolated semantic segmentation and the corresponding panoptic segmentation maps. The input image and the obtained segmentation maps are further utilized to generate the final extrapolated image. We conduct experiments on Cityscapes and ADE20K-bedroom datasets and show that our method outperforms all baselines in terms of FID, and similarity in object co-occurrence statistics.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation
Authors:
Kaustubh Milind Kulkarni,
Sucheth Shenoy
Abstract:
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation perfor…
▽ More
We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes with a validation accuracy of 99.37%. Moreover, the neural network generalizes well over the data of a player excluded from the training and validation dataset, classifying the fresh strokes with an overall best accuracy of 98.72%. Various model architectures using machine learning and deep learning based approaches have been trained for stroke recognition and their performances have been compared and benchmarked. Inferences such as performance monitoring and stroke comparison of the players using the model have been discussed. Therefore, we are contributing to the development of a computer vision based sports analytics system for the sport of table tennis that focuses on the previously unexploited aspect of the sport i.e., a player's strokes, which is extremely insightful for performance improvement.
△ Less
Submitted 31 May, 2021; v1 submitted 20 April, 2021;
originally announced April 2021.
-
Social Choice with Changing Preferences: Representation Theorems and Long-Run Policies
Authors:
Kshitij Kulkarni,
Sven Neth
Abstract:
We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide…
▽ More
We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide an axiomatic characterization of MDP reward functions that agree with the Utilitarianism social welfare functionals of social choice theory. We also provide discussion of cases when the implementation of social choice-theoretic axioms may fail to lead to long-run optimal outcomes.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Hyper-local sustainable assortment planning
Authors:
Nupur Aggarwal,
Abhishek Bansal,
Kushagra Manglik,
Kedar Kulkarni,
Vikas Raykar
Abstract:
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimi…
▽ More
Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimization approach, that yields a Pareto-front of optimal assortments for merchandisers to choose from. Using the proposed approach on a few product categories of a leading fashion retailer shows that choosing assortments with lower environmental impact with a minimal impact on revenue is possible.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships
Authors:
Kuldeep Kulkarni,
Tejas Gokhale,
Rajhans Singh,
Pavan Turaga,
Aswin Sankaranarayanan
Abstract:
Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially unde…
▽ More
Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially under uncertainty due to weather, occlusion, or noise. On the other hand, algorithms that can synthesize images from sparse labelmaps or sketches are highly desirable as tools that can guide content creators and artists to quickly generate scenes by simply specifying locations of a few objects. In this paper, we address the problem of complex scene completion from sparse labelmaps. Under this setting, very few details about the scene (30\% of object instances) are available as input for image synthesis. We propose a two-stage deep network based method, called `Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap. The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image. The proposed method is evaluated on the Cityscapes dataset and it outperforms two baselines methods on performance metrics like Fréchet Inception Distance (FID), semantic segmentation accuracy, and similarity in object co-occurrences. We also show qualitative results on a subset of ADE20K dataset that contains bedroom images.
△ Less
Submitted 20 May, 2021; v1 submitted 18 April, 2020;
originally announced April 2020.
-
A study of vectorization for matrix-free finite element methods
Authors:
Tianjiao Sun,
Lawrence Mitchell,
Kaushik Kulkarni,
Andreas Klöckner,
David A. Ham,
Paul H. J. Kelly
Abstract:
Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses difficulties to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while…
▽ More
Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses difficulties to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while generating efficient, vectorized code for the finite element method is a challenge for numerical software systems and libraries. In this work, we study cross-element vectorization in the finite element framework Firedrake via code transformation and demonstrate the efficacy of such an approach by evaluating a wide range of matrix-free operators spanning different polynomial degrees and discretizations on two recent CPUs using three mainstream compilers. Our experiments show that our approaches for cross-element vectorization achieve 30\% of theoretical peak performance for many examples of practical significance, and exceed 50\% for cases with high arithmetic intensities, with consistent speed-up over (intra-element) vectorization restricted to the local assembly kernels.
△ Less
Submitted 19 May, 2020; v1 submitted 19 March, 2019;
originally announced March 2019.
-
Rate-Adaptive Neural Networks for Spatial Multiplexers
Authors:
Suhas Lohit,
Rajhans Singh,
Kuldeep Kulkarni,
Pavan Turaga
Abstract:
In resource-constrained environments, one can employ spatial multiplexing cameras to acquire a small number of measurements of a scene, and perform effective reconstruction or high-level inference using purely data-driven neural networks. However, once trained, the measurement matrix and the network are valid only for a single measurement rate (MR) chosen at training time. To overcome this drawbac…
▽ More
In resource-constrained environments, one can employ spatial multiplexing cameras to acquire a small number of measurements of a scene, and perform effective reconstruction or high-level inference using purely data-driven neural networks. However, once trained, the measurement matrix and the network are valid only for a single measurement rate (MR) chosen at training time. To overcome this drawback, we answer the following question: How can we jointly design the measurement operator and the reconstruction/inference network so that the system can operate over a \textit{range} of MRs? To this end, we present a novel training algorithm, for learning \textbf{\textit{rate-adaptive}} networks. Using standard datasets, we demonstrate that, when tested over a range of MRs, a rate-adaptive network can provide high quality reconstruction over a the entire range, resulting in up to about 15 dB improvement over previous methods, where the network is valid for only one MR. We demonstrate the effectiveness of our approach for sample-efficient object tracking where video frames are acquired at dynamically varying MRs. We also extend this algorithm to learn the measurement operator in conjunction with image recognition networks. Experiments on MNIST and CIFAR-10 confirm the applicability of our algorithm to different tasks.
△ Less
Submitted 8 September, 2018;
originally announced September 2018.
-
Open Source Android Vulnerability Detection Tools: A Survey
Authors:
Keyur Kulkarni,
Ahmad Y Javaid
Abstract:
Since last decade, smartphones have become an integral part of everyone's life. Having the ability to handle many useful and attractive applications, smartphones sport flawless functionality and small sizes leading to their exponential growth. Additionally, due to the huge user base and a wide range of functionalities, these mobile platforms have become a popular source of information to the publi…
▽ More
Since last decade, smartphones have become an integral part of everyone's life. Having the ability to handle many useful and attractive applications, smartphones sport flawless functionality and small sizes leading to their exponential growth. Additionally, due to the huge user base and a wide range of functionalities, these mobile platforms have become a popular source of information to the public through several Apps provided by the DHS Citizen Application Directory. Such wide audience to this platform is also making it a huge target for cyber- attacks. While Android, the most popular open source mobile platform, has its base set of permissions to protect the device and resources, it does not provide a security framework to defend against any attack. This paper surveys threat, vulnerability and security analysis tools, which are open source in nature, for the Android platform and systemizes the knowledge of Android security mechanisms. Additionally, a comparison of three popular tools is presented.
△ Less
Submitted 31 July, 2018;
originally announced July 2018.
-
CS-VQA: Visual Question Answering with Compressively Sensed Images
Authors:
Li-Chi Huang,
Kuldeep Kulkarni,
Anik Jha,
Suhas Lohit,
Suren Jayasuriya,
Pavan Turaga
Abstract:
Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvabl…
▽ More
Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvable in the compressed domain. Our results show that there is nominal degradation in VQA performance when using compressive measurements, but that accuracy can be recovered when VQA pipelines are used in conjunction with state-of-the-art deep neural networks for CS reconstruction. The results presented yield important implications for resource-constrained VQA applications.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Compressive Light Field Reconstructions using Deep Learning
Authors:
Mayank Gupta,
Arjun Jauhari,
Kuldeep Kulkarni,
Suren Jayasuriya,
Alyosha Molnar,
Pavan Turaga
Abstract:
Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a…
▽ More
Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a light field. We present a deep learning approach using a new, two branch network architecture, consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution 4D light field from a single coded 2D image. This network decreases reconstruction time significantly while achieving average PSNR values of 26-32 dB on a variety of light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7 minutes as compared to the dictionary method for equivalent visual quality. These reconstructions are performed at small sampling/compression ratios as low as 8%, allowing for cheaper coded light field cameras. We test our network reconstructions on synthetic light fields, simulated coded measurements of real light fields captured from a Lytro Illum camera, and real coded images from a custom CMOS diffractive light field camera. The combination of compressive light field capture with deep learning allows the potential for real-time light field video acquisition systems in the future.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.
-
Convolutional Neural Networks for Non-iterative Reconstruction of Compressively Sensed Images
Authors:
Suhas Lohit,
Kuldeep Kulkarni,
Ronan Kerviche,
Pavan Turaga,
Amit Ashok
Abstract:
Traditional algorithms for compressive sensing recovery are computationally expensive and are ineffective at low measurement rates. In this work, we propose a data driven non-iterative algorithm to overcome the shortcomings of earlier iterative algorithms. Our solution, ReconNet, is a deep neural network, whose parameters are learned end-to-end to map block-wise compressive measurements of the sce…
▽ More
Traditional algorithms for compressive sensing recovery are computationally expensive and are ineffective at low measurement rates. In this work, we propose a data driven non-iterative algorithm to overcome the shortcomings of earlier iterative algorithms. Our solution, ReconNet, is a deep neural network, whose parameters are learned end-to-end to map block-wise compressive measurements of the scene to the desired image blocks. Reconstruction of an image becomes a simple forward pass through the network and can be done in real-time. We show empirically that our algorithm yields reconstructions with higher PSNRs compared to iterative algorithms at low measurement rates and in presence of measurement noise. We also propose a variant of ReconNet which uses adversarial loss in order to further improve reconstruction quality. We discuss how adding a fully connected layer to the existing ReconNet architecture allows for jointly learning the measurement matrix and the reconstruction algorithm in a single network. Experiments on real data obtained from a block compressive imager show that our networks are robust to unseen sensor noise. Finally, through an experiment in object tracking, we show that even at very low measurement rates, reconstructions using our algorithm possess rich semantic content that can be used for high level inference.
△ Less
Submitted 16 August, 2017; v1 submitted 15 August, 2017;
originally announced August 2017.
-
Automatic Recognition of Facial Displays of Unfelt Emotions
Authors:
Kaustubh Kulkarni,
Ciprian Adrian Corneanu,
Ikechukwu Ofodile,
Sergio Escalera,
Xavier Baro,
Sylwia Hyniewska,
Juri Allik,
Gholamreza Anbarjafari
Abstract:
Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposin…
▽ More
Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase.
△ Less
Submitted 9 January, 2018; v1 submitted 13 July, 2017;
originally announced July 2017.
-
Carina: Interactive Million-Node Graph Visualization using Web Browser Technologies
Authors:
Dezhi Fang,
Matthew Keezer,
Jacob Williams,
Kshitij Kulkarni,
Robert Pienta,
Duen Horng Chau
Abstract:
We are working on a scalable, interactive visualization system, called Carina, for people to explore million-node graphs. By using latest web browser technologies, Carina offers fast graph rendering via WebGL, and works across desktop (via Electron) and mobile platforms. Different from most existing graph visualization tools, Carina does not store the full graph in RAM, enabling it to work with gr…
▽ More
We are working on a scalable, interactive visualization system, called Carina, for people to explore million-node graphs. By using latest web browser technologies, Carina offers fast graph rendering via WebGL, and works across desktop (via Electron) and mobile platforms. Different from most existing graph visualization tools, Carina does not store the full graph in RAM, enabling it to work with graphs with up to 69M edges. We are working to improve and open-source Carina, to offer researchers and practitioners a new, scalable way to explore and visualize large graph datasets.
△ Less
Submitted 24 February, 2017; v1 submitted 23 February, 2017;
originally announced February 2017.
-
On Optimal Spectrum Access of Cognitive Relay With Finite Packet Buffer
Authors:
Kedar Kulkarni,
Adrish Banerjee
Abstract:
We investigate a cognitive radio system where secondary user (SU) relays primary user (PU) packets using two-phase relaying. SU transmits its own packets with some access probability in relaying phase using time sharing. PU and SU have queues of finite capacity which results in packet loss when the queues are full. Utilizing knowledge of relay queue state, SU aims to maximize its packet throughput…
▽ More
We investigate a cognitive radio system where secondary user (SU) relays primary user (PU) packets using two-phase relaying. SU transmits its own packets with some access probability in relaying phase using time sharing. PU and SU have queues of finite capacity which results in packet loss when the queues are full. Utilizing knowledge of relay queue state, SU aims to maximize its packet throughput while kee** packet loss probability of PU below a threshold. By exploiting structure of the problem, we formulate it as a linear program and find optimal access policy of SU. We also propose low complexity sub-optimal access policies, namely constant probability transmission and step transmission. Numerical results are presented to compare performance of proposed methods and study effect of queue sizes on packet throughput.
△ Less
Submitted 22 January, 2017;
originally announced January 2017.
-
Multi-channel Sensing And Resource Allocation in Energy Constrained Cognitive Radio Networks
Authors:
Kedar Kulkarni,
Adrish Banerjee
Abstract:
We consider a cognitive radio network in a multi-channel licensed environment. Secondary user transmits in a channel if the channel is sensed to be vacant. This results in a tradeoff between sensing time and transmission time. When secondary users are energy constrained, energy available for transmission is less if more energy is used in sensing. This gives rise to an energy tradeoff. For multiple…
▽ More
We consider a cognitive radio network in a multi-channel licensed environment. Secondary user transmits in a channel if the channel is sensed to be vacant. This results in a tradeoff between sensing time and transmission time. When secondary users are energy constrained, energy available for transmission is less if more energy is used in sensing. This gives rise to an energy tradeoff. For multiple primary channels, secondary users must decide appropriate sensing time and transmission power in each channel to maximize average aggregate-bit throughput in each frame duration while ensuring quality-of-service of primary users. Considering time and energy as limited resources, we formulate this problem as a resource allocation problem. Initially a single secondary user scenario is considered and solution is obtained using decomposition and alternating optimization techniques. Later we extend the analysis for the case of multiple secondary users. Simulation results are presented to study effect of channel occupancy, fading and energy availability on performance of proposed method.
△ Less
Submitted 17 January, 2017;
originally announced January 2017.
-
On Stable Throughput of Cognitive Radio Networks With Cooperating Secondary Users
Authors:
Kedar Kulkarni,
Adrish Banerjee
Abstract:
In this paper, we study cooperative cognitive radio networks consisting of a primary user and multiple secondary users. Secondary users transmit only when primary user is sensed as silent and may interfere with primary transmission due to imperfect sensing. When primary activity is sensed correctly, secondary users cooperate with primary user by assisting retransmission of failed packets of primar…
▽ More
In this paper, we study cooperative cognitive radio networks consisting of a primary user and multiple secondary users. Secondary users transmit only when primary user is sensed as silent and may interfere with primary transmission due to imperfect sensing. When primary activity is sensed correctly, secondary users cooperate with primary user by assisting retransmission of failed packets of primary user. We analyze packet throughput of primary and secondary users for three variations of proposed cooperation method. Signal flow graph (SFG) based approach is employed to obtain closed form expressions of packet throughput. The analysis is done for two cases; individual sensing and cooperative sensing. Further, we characterize optimal transmission probability of secondary users that maximizes individual secondary packet throughput kee** all queues in the system stable. Results present a comparison of throughput performance of proposed cooperation methods under different scenarios and show their benefits for both primary as well as secondary user throughput.
△ Less
Submitted 1 August, 2016; v1 submitted 28 July, 2016;
originally announced July 2016.
-
Weakly Supervised Learning of Heterogeneous Concepts in Videos
Authors:
Sohil Shah,
Kuldeep Kulkarni,
Arijit Biswas,
Ankit Gandhi,
Om Deshmukh,
Larry Davis
Abstract:
Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a g…
▽ More
Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24% relative improvement for pairwise concept classification in the Casablanca dataset and 9% relative improvement for localization in the A2D dataset as compared to the most competitive baseline.
△ Less
Submitted 12 July, 2016;
originally announced July 2016.
-
Fast Integral Image Estimation at 1% measurement rate
Authors:
Kuldeep Kulkarni,
Pavan Turaga
Abstract:
We propose a framework called ReFInE to directly obtain integral image estimates from a very small number of spatially multiplexed measurements of the scene without iterative reconstruction of any auxiliary image, and demonstrate their practical utility in visual object tracking. Specifically, we design measurement matrices which are tailored to facilitate extremely fast estimation of the integral…
▽ More
We propose a framework called ReFInE to directly obtain integral image estimates from a very small number of spatially multiplexed measurements of the scene without iterative reconstruction of any auxiliary image, and demonstrate their practical utility in visual object tracking. Specifically, we design measurement matrices which are tailored to facilitate extremely fast estimation of the integral image, by using a single-shot linear operation on the measured vector. Leveraging a prior model for the images, we formulate a nuclear norm minimization problem with second order conic constraints to jointly obtain the measurement matrix and the linear operator. Through qualitative and quantitative experiments, we show that high quality integral image estimates can be obtained using our framework at very low measurement rates. Further, on a standard dataset of 50 videos, we present object tracking results which are comparable to the state-of-the-art methods, even at an extremely low measurement rate of 1%.
△ Less
Submitted 26 January, 2016;
originally announced January 2016.
-
ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Random Measurements
Authors:
Kuldeep Kulkarni,
Suhas Lohit,
Pavan Turaga,
Ronan Kerviche,
Amit Ashok
Abstract:
The goal of this paper is to present a non-iterative and more importantly an extremely fast algorithm to reconstruct images from compressively sensed (CS) random measurements. To this end, we propose a novel convolutional neural network (CNN) architecture which takes in CS measurements of an image as input and outputs an intermediate reconstruction. We call this network, ReconNet. The intermediate…
▽ More
The goal of this paper is to present a non-iterative and more importantly an extremely fast algorithm to reconstruct images from compressively sensed (CS) random measurements. To this end, we propose a novel convolutional neural network (CNN) architecture which takes in CS measurements of an image as input and outputs an intermediate reconstruction. We call this network, ReconNet. The intermediate reconstruction is fed into an off-the-shelf denoiser to obtain the final reconstructed image. On a standard dataset of images we show significant improvements in reconstruction results (both in terms of PSNR and time complexity) over state-of-the-art iterative CS reconstruction algorithms at various measurement rates. Further, through qualitative experiments on real data collected using our block single pixel camera (SPC), we show that our network is highly robust to sensor noise and can recover visually better quality images than competitive algorithms at extremely low sensing rates of 0.1 and 0.04. To demonstrate that our algorithm can recover semantically informative images even at a low measurement rate of 0.01, we present a very robust proof of concept real-time visual tracking application.
△ Less
Submitted 7 March, 2016; v1 submitted 26 January, 2016;
originally announced January 2016.
-
Reconstruction-free action inference from compressive imagers
Authors:
Kuldeep Kulkarni,
Pavan Turaga
Abstract:
Persistent surveillance from camera networks, such as at parking lots, UAVs, etc., often results in large amounts of video data, resulting in significant challenges for inference in terms of storage, communication and computation. Compressive cameras have emerged as a potential solution to deal with the data deluge issues in such applications. However, inference tasks such as action recognition re…
▽ More
Persistent surveillance from camera networks, such as at parking lots, UAVs, etc., often results in large amounts of video data, resulting in significant challenges for inference in terms of storage, communication and computation. Compressive cameras have emerged as a potential solution to deal with the data deluge issues in such applications. However, inference tasks such as action recognition require high quality features which implies reconstructing the original video data. Much work in compressive sensing (CS) theory is geared towards solving the reconstruction problem, where state-of-the-art methods are computationally intensive and provide low-quality results at high compression rates. Thus, reconstruction-free methods for inference are much desired. In this paper, we propose reconstruction-free methods for action recognition from compressive cameras at high compression ratios of 100 and above. Recognizing actions directly from CS measurements requires features which are mostly nonlinear and thus not easily applicable. This leads us to search for such properties that are preserved in compressive measurements. To this end, we propose the use of spatio-temporal smashed filters, which are compressive domain versions of pixel-domain matched filters. We conduct experiments on publicly available databases and show that one can obtain recognition rates that are comparable to the oracle method in uncompressed setup, even for high compression ratios.
△ Less
Submitted 18 January, 2015;
originally announced January 2015.
-
Continuous Action Recognition Based on Sequence Alignment
Authors:
Kaustubh Kulkarni,
Georgios Evangelidis,
Jan Cech,
Radu Horaud
Abstract:
Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time war** (DTW) framework and devise a novel visual alignment technique, namely dynamic frame war** (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a te…
▽ More
Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time war** (DTW) framework and devise a novel visual alignment technique, namely dynamic frame war** (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods.
△ Less
Submitted 2 June, 2014;
originally announced June 2014.