Search | arXiv e-print repository

Crafting Parts for Expressive Object Composition

Authors: Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam

Abstract: Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt… ▽ More Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt either leads to an entirely different image (e.g., missing/incorrect identity) or the extra part details simply being ignored. To mitigate these issues, we introduce PartCraft, which enables image generation based on fine-grained part-level details specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartCraft first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right object region. After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions and combine them to produce the final image. All the stages of PartCraft are based on repurposing a pre-trained diffusion model, which enables it to generalize across various domains without training. We demonstrate the effectiveness of part-level control provided by PartCraft qualitatively through visual examples and quantitatively in comparison to the contemporary baselines. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Project Page Will Be Here: https://rangwani-harsh.github.io/PartCraft

arXiv:2405.16716 [pdf, other]

Adaptive Incentive Design with Learning Agents

Authors: Chinmay Maheshwari, Kshitij Kulkarni, Manxi Wu, Shankar Sastry

Abstract: How can the system operator learn an incentive mechanism that achieves social optimality based on limited information about the agents' behavior, who are dynamically updating their strategies? To answer this question, we propose an \emph{adaptive} incentive mechanism. This mechanism updates the incentives of agents based on the feedback of each agent's externality, evaluated as the difference betw… ▽ More How can the system operator learn an incentive mechanism that achieves social optimality based on limited information about the agents' behavior, who are dynamically updating their strategies? To answer this question, we propose an \emph{adaptive} incentive mechanism. This mechanism updates the incentives of agents based on the feedback of each agent's externality, evaluated as the difference between the player's marginal cost and society's marginal cost at each time step. The proposed mechanism updates the incentives on a slower timescale compared to the agents' learning dynamics, resulting in a two-timescale coupled dynamical system. Notably, this mechanism is agnostic to the specific learning dynamics used by agents to update their strategies. We show that any fixed point of this adaptive incentive mechanism corresponds to the optimal incentive mechanism, ensuring that the Nash equilibrium coincides with the socially optimal strategy. Additionally, we provide sufficient conditions that guarantee the convergence of the adaptive incentive mechanism to a fixed point. Our results apply to both atomic and non-atomic games. To demonstrate the effectiveness of our proposed mechanism, we verify the convergence conditions in two practically relevant games: atomic networked quadratic aggregative games and non-atomic network routing games. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 33 pages

arXiv:2404.18416 [pdf, other]

Capabilities of Gemini Models in Medicine

Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.03919 [pdf, other]

Understanding the Impact of Coalitions between EV Charging Stations

Authors: Sukanya Kudva, Kshitij Kulkarni, Chinmay Maheshwari, Anil Aswani, Shankar Sastry

Abstract: The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. This expansion, however, places significant charging demand on the electricity grid, impacting grid operations and electricity pricing. While coordination among all charging stations is beneficial, it may not be always feasible. However, a subset of charging stations, which could be jointly op… ▽ More The rapid growth of electric vehicles (EVs) is driving the expansion of charging infrastructure globally. This expansion, however, places significant charging demand on the electricity grid, impacting grid operations and electricity pricing. While coordination among all charging stations is beneficial, it may not be always feasible. However, a subset of charging stations, which could be jointly operated by a company, could coordinate to decide their charging profile. In this paper we investigate whether such coalitions between charging stations is better than no coordination. We model EV charging as a non-cooperative aggregative game, where each station's cost is determined by both monetary payments tied to reactive electricity prices on the grid and its sensitivity to deviations from a nominal charging profile. We consider a solution concept that we call $\mathcal{C}$-Nash equilibrium, which is tied to a coalition $\mathcal{C}$ of charging stations coordinating to reduce their cumulative costs. We provide sufficient conditions, in terms of the demand and sensitivity of charging stations, to determine when independent (uncoordinated) operation of charging stations could result in lower overall costs to charging stations, the coalition, and charging stations outside the coalition. Somewhat counter to intuition, we demonstrate scenarios where allowing charging stations to operate independently is better than coordinating as a coalition. Jointly, these results provide operators of charging stations insights into how to coordinate their charging behavior, and open several research directions. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 24 pages, 5 figures

MSC Class: 91A10; 91A80; 91B52; 91B54; 91B74; 93A16; 93A15

arXiv:2403.02525 [pdf, other]

An Analysis of Intent-Based Markets

Authors: Tarun Chitra, Kshitij Kulkarni, Mallesh Pai, Theo Diamandis

Abstract: Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse user experience compared to their centralized counterparts. Recently, off-chain marketplaces, colloquially called `intent markets,' have been proposed as a solution to these problems. In these markets, agents called \emph{solvers} compete to satisfy… ▽ More Mechanisms for decentralized finance on blockchains suffer from various problems, including suboptimal price execution for users, latency, and a worse user experience compared to their centralized counterparts. Recently, off-chain marketplaces, colloquially called `intent markets,' have been proposed as a solution to these problems. In these markets, agents called \emph{solvers} compete to satisfy user orders, which may include complicated user-specified conditions. We provide two formal models of solvers' strategic behavior: one probabilistic and another deterministic. In our first model, solvers initially pay upfront costs to enter a Dutch auction to fill the user's order and then exert congestive, costly effort to search for prices for the user. Our results show that the costs incurred by solvers result in restricted entry in the market. Further, in the presence of costly effort and congestion, our results counter-intuitively show that a planner who aims to maximize user welfare may actually prefer to restrict entry, resulting in limited oligopoly. We then introduce an alternative, optimization-based deterministic model which corroborates these results. We conclude with extensions of our model to other auctions within blockchains and non-cryptocurrency applications, such as the US SEC's Proposal 615. △ Less

Submitted 6 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 27 pages, 2 figures

arXiv:2401.16844 [pdf, other]

Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area

Authors: Chinmay Maheshwari, Kshitij Kulkarni, Druv Pai, Jiarui Yang, Manxi Wu, Shankar Sastry

Abstract: Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. In this study, we address this concern by proposing a new class of congestion pricing schemes that not only minimize congestion levels but also incorporate an equity objective to reduce cost disparitie… ▽ More Congestion pricing, while adopted by many cities to alleviate traffic congestion, raises concerns about widening socioeconomic disparities due to its disproportionate impact on low-income travelers. In this study, we address this concern by proposing a new class of congestion pricing schemes that not only minimize congestion levels but also incorporate an equity objective to reduce cost disparities among travelers with different willingness-to-pay. Our analysis builds on a congestion game model with heterogeneous traveler populations. We present four pricing schemes that account for practical considerations, such as the ability to charge differentiated tolls to various traveler populations and the option to toll all or only a subset of edges in the network. We evaluate our pricing schemes in the calibrated freeway network of the San Francisco Bay Area. We demonstrate that the proposed congestion pricing schemes improve both efficiency (in terms of reduced average travel time) and equity (the disparities of travel costs experienced by different populations) compared to the current pricing scheme. Moreover, our pricing schemes also generate a total revenue comparable to the current pricing scheme. Our results further show that pricing schemes charging differentiated prices to traveler populations with varying willingness-to-pay lead to a more equitable distribution of travel costs compared to those that charge a homogeneous price to all. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 42 pages, 11 figures

MSC Class: 91A07; 91A14; 91A68; 91A90

arXiv:2401.05654 [pdf, other]

Towards Conversational Diagnostic AI

Authors: Tao Tu, Anil Palepu, Mike Schaekermann, Khaled Saab, Jan Freyberg, Ryutaro Tanno, Amy Wang, Brenna Li, Mohamed Amin, Nenad Tomasev, Shekoofeh Azizi, Karan Singhal, Yong Cheng, Le Hou, Albert Webson, Kavita Kulkarni, S Sara Mahdavi, Christopher Semturs, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias, Alan Karthikesalingam, Vivek Natarajan

Abstract: At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introdu… ▽ More At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 46 pages, 5 figures in main text, 19 figures in appendix

arXiv:2312.00164 [pdf, other]

Towards Accurate Differential Diagnosis with Large Language Models

Authors: Daniel McDuff, Mike Schaekermann, Tao Tu, Anil Palepu, Amy Wang, Jake Garrison, Karan Singhal, Yash Sharma, Shekoofeh Azizi, Kavita Kulkarni, Le Hou, Yong Cheng, Yun Liu, S Sara Mahdavi, Sushant Prakash, Anupam Pathak, Christopher Semturs, Shwetak Patel, Dale R Webster, Ewa Dominowska, Juraj Gottweis, Joelle Barral, Katherine Chou, Greg S Corrado, Yossi Matias , et al. (3 additional authors not shown)

Abstract: An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM op… ▽ More An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2310.07865 [pdf, ps, other]

The Specter (and Spectra) of Miner Extractable Value

Authors: Guillermo Angeris, Tarun Chitra, Theo Diamandis, Kshitij Kulkarni

Abstract: Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering of transactions. In this work, we introduce a simple theoretical definition of the 'cost of MEV', prove some basic properties, and show that the definition is useful via a number of examples. In a variety of settings, this definition is related to the 'smoothness' of a fun… ▽ More Miner extractable value (MEV) refers to any excess value that a transaction validator can realize by manipulating the ordering of transactions. In this work, we introduce a simple theoretical definition of the 'cost of MEV', prove some basic properties, and show that the definition is useful via a number of examples. In a variety of settings, this definition is related to the 'smoothness' of a function over the symmetric group. From this definition and some basic observations, we recover a number of results from the literature. △ Less

Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2307.13139 [pdf, ps, other]

Attacks on Dynamic DeFi Interest Rate Curves

Authors: Tarun Chitra, Peteris Erins, Kshitij Kulkarni

Abstract: As decentralized money market protocols continue to grow in value locked, there have been a number of optimizations proposed for improving capital efficiency. One set of proposals from Euler Finance and Mars Protocol is to have an interest rate curve that is a proportional-integral-derivative (PID) controller. In this paper, we demonstrate attacks on proportional and proportional-integral controll… ▽ More As decentralized money market protocols continue to grow in value locked, there have been a number of optimizations proposed for improving capital efficiency. One set of proposals from Euler Finance and Mars Protocol is to have an interest rate curve that is a proportional-integral-derivative (PID) controller. In this paper, we demonstrate attacks on proportional and proportional-integral controlled interest rate curves. The attack allows one to manipulate the interest rate curve to take a higher proportion of the earned yield than their pro-rata share of the lending pool. We conclude with an argument that PID interest rate curves can actually \emph{reduce} capital efficiency (due to attack mitigations) unless supply and demand elasticity to rate changes are sufficiently high. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2303.08639 [pdf, other]

Blowing in the Wind: CycleNet for Human Cinemagraphs from Still Images

Authors: Hugo Bertiche, Niloy J. Mitra, Kuldeep Kulkarni, Chun-Hao Paul Huang, Tuanfeng Y. Wang, Meysam Madadi, Sergio Escalera, Duygu Ceylan

Abstract: Cinemagraphs are short loo** videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We inv… ▽ More Cinemagraphs are short loo** videos created by adding subtle motions to a static image. This kind of media is popular and engaging. However, automatic generation of cinemagraphs is an underexplored area and current solutions require tedious low-level manual authoring by artists. In this paper, we present an automatic method that allows generating human cinemagraphs from single RGB images. We investigate the problem in the context of dressed humans under the wind. At the core of our method is a novel cyclic neural network that produces loo** cinemagraphs for the target loop duration. To circumvent the problem of collecting real data, we demonstrate that it is possible, by working in the image normal space, to learn garment motion dynamics on synthetic data and generalize to real data. We evaluate our method on both synthetic and real data and demonstrate that it is possible to create compelling and plausible cinemagraphs from single RGB images. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2303.00830 [pdf, other]

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

Authors: Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy

Abstract: In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed s… ▽ More In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions. △ Less

Submitted 5 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2302.09657 [pdf]

Table Tennis Stroke Detection and Recognition Using Ball Trajectory Data

Authors: Kaustubh Milind Kulkarni, Rohan S Jamadagni, Jeffrey Aaron Paul, Sucheth Shenoy

Abstract: In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a tempora… ▽ More In this work, the novel task of detecting and classifying table tennis strokes solely using the ball trajectory has been explored. A single camera setup positioned in the umpire's view has been employed to procure a dataset consisting of six stroke classes executed by four professional table tennis players. Ball tracking using YOLOv4, a traditional object detection model, and TrackNetv2, a temporal heatmap based model, have been implemented on our dataset and their performances have been benchmarked. A mathematical approach developed to extract temporal boundaries of strokes using the ball trajectory data yielded a total of 2023 valid strokes in our dataset, while also detecting services and missed strokes successfully. The temporal convolutional network developed performed stroke recognition on completely unseen data with an accuracy of 87.155%. Several machine learning and deep learning based model architectures have been trained for stroke recognition using ball trajectory input and benchmarked based on their performances. While stroke recognition in the field of table tennis has been extensively explored based on human action recognition using video data focused on the player's actions, the use of ball trajectory data for the same is an unexplored characteristic of the sport. Hence, the motivation behind the work is to demonstrate that meaningful inferences such as stroke detection and recognition can be drawn using minimal input information. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Comments: 9 pages, 5 figures, 6 tables

arXiv:2302.02249 [pdf, other]

Self-supervised Multi-view Disentanglement for Expansion of Visual Collections

Authors: Nihal Jain, Praneetha Vaddamanu, Paridhi Maheshwari, Vishwa Vinay, Kuldeep Kulkarni

Abstract: Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a s… ▽ More Image search engines enable the retrieval of images relevant to a query image. In this work, we consider the setting where a query for similar images is derived from a collection of images. For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color. We assume access to a set of feature extractors, each of which computes representations for a specific view. Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views. To this end, we propose a self-supervised learning method for extracting disentangled view-specific representations for images such that the inter-view overlap is minimized. We show how this allows us to compute the intent of a collection as a distribution over views. We show how effective retrieval can be performed by prioritizing candidate expansion images that match the intent of a query collection. Finally, we present a new querying mechanism for image search enabled by composing multiple collections and perform retrieval under this setting using the techniques presented in this paper. △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: A version of this paper has been accepted at WSDM 2023

arXiv:2301.12532 [pdf, ps, other]

Credible, Optimal Auctions via Blockchains

Authors: Tarun Chitra, Matheus V. X. Ferreira, Kshitij Kulkarni

Abstract: Akbarpour and Li (2020) formalized credibility as an auction desideratum where the auctioneer cannot benefit by implementing undetectable deviations from the promised auction and showed that, in the plain model, the ascending price auction with reserves is the only credible, strategyproof, revenue-optimal auction. Ferreira and Weinberg (2020) proposed the Deferred Revelation Auction (DRA) as a com… ▽ More Akbarpour and Li (2020) formalized credibility as an auction desideratum where the auctioneer cannot benefit by implementing undetectable deviations from the promised auction and showed that, in the plain model, the ascending price auction with reserves is the only credible, strategyproof, revenue-optimal auction. Ferreira and Weinberg (2020) proposed the Deferred Revelation Auction (DRA) as a communication efficient auction that avoids the uniqueness results from Akbarpour and Li (2020) assuming the existence of cryptographic commitments and as long as bidder valuations are MHR. They also showed DRA is not credible in settings where bidder valuations are $α$-strongly regular unless $α> 1$. In this paper, we ask if blockchains allow us to design a larger class of credible auctions. We answer this question positively, by showing that DRA is credible even for $α$-strongly regular distributions for all $α> 0$ if implemented over a secure and censorship-resistant blockchain. We argue ledgers provide two properties that limit deviations from a self-interested auctioneer. First, the existence of smart contracts allows one to extend the concept of credibility to settings where the auctioneer does not have a reputation -- one of the main limitations for the definition of credibility from Akbarpour and Li (2020). Second, blockchains allow us to implement mechanisms over a public broadcast channel, removing the adaptive undetectable deviations driving the negative results of Ferreira and Weinberg (2020). △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2211.16596 [pdf, other]

Towards Dynamic Causal Discovery with Rare Events: A Nonparametric Conditional Independence Test

Authors: Chih-Yuan Chiu, Kshitij Kulkarni, Shankar Sastry

Abstract: Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probabilit… ▽ More Causal phenomena associated with rare events occur across a wide range of engineering problems, such as risk-sensitive safety analysis, accident analysis and prevention, and extreme value theory. However, current methods for causal discovery are often unable to uncover causal links, between random variables in a dynamic setting, that manifest only when the variables first experience low-probability realizations. To address this issue, we introduce a novel statistical independence test on data collected from time-invariant dynamical systems in which rare but consequential events occur. In particular, we exploit the time-invariance of the underlying data to construct a superimposed dataset of the system state before rare events happen at different timesteps. We then design a conditional independence test on the reorganized data. We provide non-asymptotic sample complexity bounds for the consistency of our method, and validate its performance across various simulated and real-world datasets, including incident data collected from the Caltrans Performance Measurement System (PeMS). Code containing the datasets and experiments is publicly available. △ Less

Submitted 17 July, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2210.00708 [pdf]

EraseNet: A Recurrent Residual Network for Supervised Document Cleaning

Authors: Yashowardhan Shinde, Kishore Kulkarni, Sachin Kuberkar

Abstract: Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This pape… ▽ More Document denoising is considered one of the most challenging tasks in computer vision. There exist millions of documents that are still to be digitized, but problems like document degradation due to natural and man-made factors make this task very difficult. This paper introduces a supervised approach for cleaning dirty documents using a new fully convolutional auto-encoder architecture. This paper focuses on restoring documents with discrepancies like deformities caused due to aging of a document, creases left on the pages that were xeroxed, random black patches, lightly visible text, etc., and also improving the quality of the image for better optical character recognition system (OCR) performance. Removing noise from scanned documents is a very important step before the documents as this noise can severely affect the performance of an OCR system. The experiments in this paper have shown promising results as the model is able to learn a variety of ordinary as well as unusual noises and rectify them efficiently. △ Less

Submitted 4 July, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 10 pages, 5 figures, attempting for publication in International Journal on Document Analysis and Recognition (IJDAR)

arXiv:2207.11835 [pdf, other]

Towards a Theory of Maximal Extractable Value I: Constant Function Market Makers

Authors: Kshitij Kulkarni, Theo Diamandis, Tarun Chitra

Abstract: Maximal Extractable Value (MEV) refers to excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users' transactions to maximize fees or from inserting new transactions that front-run users' transactions. One of the most common types of MEV involves a `sandwich attack' against a user trading on a constant function marke… ▽ More Maximal Extractable Value (MEV) refers to excess value captured by miners (or validators) from users in a cryptocurrency network. This excess value often comes from reordering users' transactions to maximize fees or from inserting new transactions that front-run users' transactions. One of the most common types of MEV involves a `sandwich attack' against a user trading on a constant function market maker (CFMM), which is a popular class of automated market maker. We analyze game theoretic properties of MEV in CFMMs that we call \textit{routing} and \textit{reordering} MEV. In the case of routing, we present examples where the existence of MEV both degrades and, counterintuitively, \emph{improves} the quality of routing. We construct an analogue of the price of anarchy for this setting and demonstrate that if the impact of a sandwich attack is localized in a suitable sense, then the price of anarchy is constant. In the case of reordering, we show conditions when the maximum price impact caused by the reordering of sandwich attacks in a sequence of trades, relative to the average price, impact is $O(\log n)$ in the number of user trades. Combined, our results suggest methods that both MEV searchers and CFMM designers can utilize for estimating costs and profits of MEV. △ Less

Submitted 30 April, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

arXiv:2207.03729 [pdf, other]

GEMS: Scene Expansion using Generative Models of Graphs

Authors: Rishi Agarwal, Tirupati Saketh Chandra, Vaidehi Patil, Aniruddha Mahapatra, Kuldeep Kulkarni, Vishwa Vinay

Abstract: Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by a… ▽ More Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene graph expansion as a sequential prediction task involving multiple steps of first predicting a new node and then predicting the set of relationships between the newly predicted node and previous nodes in the graph. We propose a sequencing strategy for observed graphs that retains the clustering patterns amongst nodes. In addition, we leverage external knowledge to train our graph generation model, enabling greater generalization of node predictions. Due to the inefficiency of existing maximum mean discrepancy (MMD) based metrics for graph generation problems in evaluating predicted relationships between nodes (objects), we design novel metrics that comprehensively evaluate different aspects of predicted relations. We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs using the standard MMD-based metrics and our proposed metrics. We observe that the graphs generated by our method, GEMS, better represent the real distribution of the scene graphs than the baseline methods like GraphRNN. △ Less

Submitted 8 July, 2022; originally announced July 2022.

arXiv:2204.05507 [pdf, ps, other]

Inducing Social Optimality in Games via Adaptive Incentive Design

Authors: Chinmay Maheshwari, Kshitij Kulkarni, Manxi Wu, Shankar Sastry

Abstract: How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social pl… ▽ More How can a social planner adaptively incentivize selfish agents who are learning in a strategic environment to induce a socially optimal outcome in the long run? We propose a two-timescale learning dynamics to answer this question in both atomic and non-atomic games. In our learning dynamics, players adopt a class of learning rules to update their strategies at a faster timescale, while a social planner updates the incentive mechanism at a slower timescale. In particular, the update of the incentive mechanism is based on each player's externality, which is evaluated as the difference between the player's marginal cost and the society's marginal cost in each time step. We show that any fixed point of our learning dynamics corresponds to the optimal incentive mechanism such that the corresponding Nash equilibrium also achieves social optimality. We also provide sufficient conditions for the learning dynamics to converge to a fixed point so that the adaptive incentive mechanism eventually induces a socially optimal outcome. Finally, we demonstrate that the sufficient conditions for convergence are satisfied in a variety of games, including (i) atomic networked quadratic aggregative games, (ii) atomic Cournot competition, and (iii) non-atomic network routing games. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: 20 pages

arXiv:2112.03051 [pdf, other]

Controllable Animation of Fluid Elements in Still Images

Authors: Aniruddha Mahapatra, Kuldeep Kulkarni

Abstract: We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. Specifically, we focus on the animation of fluid elements like water, smoke, fire, which have the properties of repeating textures and continuous fluid motion. Taking inspiration from prior works, we represent the motion of such fluid elements in the image in the form of a constan… ▽ More We propose a method to interactively control the animation of fluid elements in still images to generate cinemagraphs. Specifically, we focus on the animation of fluid elements like water, smoke, fire, which have the properties of repeating textures and continuous fluid motion. Taking inspiration from prior works, we represent the motion of such fluid elements in the image in the form of a constant 2D optical flow map. To this end, we allow the user to provide any number of arrow directions and their associated speeds along with a mask of the regions the user wants to animate. The user-provided input arrow directions, their corresponding speed values, and the mask are then converted into a dense flow map representing a constant optical flow map (FD). We observe that FD, obtained using simple exponential operations can closely approximate the plausible motion of elements in the image. We further refine computed dense optical flow map FD using a generative-adversarial network (GAN) to obtain a more realistic flow map. We devise a novel UNet based architecture to autoregressively generate future frames using the refined optical flow map by forward-war** the input image features at different resolutions. We conduct extensive experiments on a publicly available dataset and show that our method is superior to the baselines in terms of qualitative and quantitative metrics. In addition, we show the qualitative animations of the objects in directions that did not exist in the training set and provide a way to synthesize videos that otherwise would not exist in the real world. △ Less

Submitted 25 September, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:2110.08879 [pdf, other]

Dynamic Tolling for Inducing Socially Optimal Traffic Loads

Authors: Chinmay Maheshwari, Kshitij Kulkarni, Manxi Wu, Shankar Sastry

Abstract: How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of… ▽ More How to design tolls that induce socially optimal traffic loads with dynamically arriving travelers who make selfish routing decisions? We propose a two-timescale discrete-time stochastic dynamics that adaptively adjusts the toll prices on a parallel link network while accounting for the updates of traffic loads induced by the incoming and outgoing travelers and their route choices. The updates of loads and tolls in our dynamics have three key features: (i) The total demand of incoming and outgoing travelers is stochastically realized; (ii) Travelers are myopic and selfish in that they choose routes according to a perturbed best response given the current latency and tolls on parallel links; (iii) The update of tolls is at a slower timescale as compared to the the update of loads. We show that the loads and the tolls eventually concentrate in a neighborhood of the fixed point, which corresponds to the socially optimal load and toll price. Moreover, the fixed point load is also a stochastic user equilibrium with respect to the toll price. Our results are useful for traffic authorities to efficiently manage traffic loads in response to the arrival and departure of travelers. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: 18 pages, 3 figures

arXiv:2108.13702 [pdf, other]

SemIE: Semantically-aware Image Extrapolation

Authors: Bholeshwar Khurana, Soumya Ranjan Dash, Abhishek Bhatia, Aniruddha Mahapatra, Hrituraj Singh, Kuldeep Kulkarni

Abstract: We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the ex… ▽ More We propose a semantically-aware novel paradigm to perform image extrapolation that enables the addition of new object instances. All previous methods are limited in their capability of extrapolation to merely extending the already existing objects in the image. However, our proposed approach focuses not only on (i) extending the already present objects but also on (ii) adding new objects in the extended region based on the context. To this end, for a given image, we first obtain an object segmentation map using a state-of-the-art semantic segmentation method. The, thus, obtained segmentation map is fed into a network to compute the extrapolated semantic segmentation and the corresponding panoptic segmentation maps. The input image and the obtained segmentation maps are further utilized to generate the final extrapolated image. We conduct experiments on Cityscapes and ADE20K-bedroom datasets and show that our method outperforms all baselines in terms of FID, and similarity in object co-occurrence statistics. △ Less

Submitted 31 August, 2021; originally announced August 2021.

Comments: To appear in International Conference on Computer Vision (ICCV) 2021. Project URL: https://semie-iccv.github.io

arXiv:2104.09907 [pdf]

Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation

Authors: Kaustubh Milind Kulkarni, Sucheth Shenoy

Abstract: We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation perfor… ▽ More We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes with a validation accuracy of 99.37%. Moreover, the neural network generalizes well over the data of a player excluded from the training and validation dataset, classifying the fresh strokes with an overall best accuracy of 98.72%. Various model architectures using machine learning and deep learning based approaches have been trained for stroke recognition and their performances have been compared and benchmarked. Inferences such as performance monitoring and stroke comparison of the players using the model have been discussed. Therefore, we are contributing to the development of a computer vision based sports analytics system for the sport of table tennis that focuses on the previously unexploited aspect of the sport i.e., a player's strokes, which is extremely insightful for performance improvement. △ Less

Submitted 31 May, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

Comments: Accepted at CVPR Sports Workshop 2021 (7th International Workshop on Computer Vision in Sports) (CVSports)

arXiv:2011.02544 [pdf, ps, other]

Social Choice with Changing Preferences: Representation Theorems and Long-Run Policies

Authors: Kshitij Kulkarni, Sven Neth

Abstract: We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide… ▽ More We study group decision making with changing preferences as a Markov Decision Process. We are motivated by the increasing prevalence of automated decision-making systems when making choices for groups of people over time. Our main contribution is to show how classic representation theorems from social choice theory can be adapted to characterize optimal policies in this dynamic setting. We provide an axiomatic characterization of MDP reward functions that agree with the Utilitarianism social welfare functionals of social choice theory. We also provide discussion of cases when the implementation of social choice-theoretic axioms may fail to lead to long-run optimal outcomes. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: Accepted to the Workshop on Consequential Decision Making in Dynamic Environments, NeurIPS 2020

arXiv:2007.13414 [pdf, other]

Hyper-local sustainable assortment planning

Authors: Nupur Aggarwal, Abhishek Bansal, Kushagra Manglik, Kedar Kulkarni, Vikas Raykar

Abstract: Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimi… ▽ More Assortment planning, an important seasonal activity for any retailer, involves choosing the right subset of products to stock in each store.While existing approaches only maximize the expected revenue, we propose including the environmental impact too, through the Higg Material Sustainability Index. The trade-off between revenue and environmental impact is balanced through a multi-objective optimization approach, that yields a Pareto-front of optimal assortments for merchandisers to choose from. Using the proposed approach on a few product categories of a leading fashion retailer shows that choosing assortments with lower environmental impact with a minimal impact on revenue is possible. △ Less

Submitted 27 July, 2020; originally announced July 2020.

arXiv:2004.08614 [pdf, other]

Halluci-Net: Scene Completion by Exploiting Object Co-occurrence Relationships

Authors: Kuldeep Kulkarni, Tejas Gokhale, Rajhans Singh, Pavan Turaga, Aswin Sankaranarayanan

Abstract: Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially unde… ▽ More Recently, there has been substantial progress in image synthesis from semantic labelmaps. However, methods used for this task assume the availability of complete and unambiguous labelmaps, with instance boundaries of objects, and class labels for each pixel. This reliance on heavily annotated inputs restricts the application of image synthesis techniques to real-world applications, especially under uncertainty due to weather, occlusion, or noise. On the other hand, algorithms that can synthesize images from sparse labelmaps or sketches are highly desirable as tools that can guide content creators and artists to quickly generate scenes by simply specifying locations of a few objects. In this paper, we address the problem of complex scene completion from sparse labelmaps. Under this setting, very few details about the scene (30\% of object instances) are available as input for image synthesis. We propose a two-stage deep network based method, called `Halluci-Net', that learns co-occurence relationships between objects in scenes, and then exploits these relationships to produce a dense and complete labelmap. The generated dense labelmap can then be used as input by state-of-the-art image synthesis techniques like pix2pixHD to obtain the final image. The proposed method is evaluated on the Cityscapes dataset and it outperforms two baselines methods on performance metrics like Fréchet Inception Distance (FID), semantic segmentation accuracy, and similarity in object co-occurrences. We also show qualitative results on a subset of ADE20K dataset that contains bedroom images. △ Less

Submitted 20 May, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: Accepted to AI for Content Creation Workshop @CVPR 2021

arXiv:1903.08243 [pdf, other]

doi 10.1177/1094342020945005

A study of vectorization for matrix-free finite element methods

Authors: Tianjiao Sun, Lawrence Mitchell, Kaushik Kulkarni, Andreas Klöckner, David A. Ham, Paul H. J. Kelly

Abstract: Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses difficulties to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while… ▽ More Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over unstructured meshes, poses difficulties to effective vectorization. Maintaining a user-friendly high-level interface with a suitable degree of abstraction while generating efficient, vectorized code for the finite element method is a challenge for numerical software systems and libraries. In this work, we study cross-element vectorization in the finite element framework Firedrake via code transformation and demonstrate the efficacy of such an approach by evaluating a wide range of matrix-free operators spanning different polynomial degrees and discretizations on two recent CPUs using three mainstream compilers. Our experiments show that our approaches for cross-element vectorization achieve 30\% of theoretical peak performance for many examples of practical significance, and exceed 50\% for cases with high arithmetic intensities, with consistent speed-up over (intra-element) vectorization restricted to the local assembly kernels. △ Less

Submitted 19 May, 2020; v1 submitted 19 March, 2019; originally announced March 2019.

Journal ref: International Journal of High Performance Computing Applications (2020)

arXiv:1809.02850 [pdf, other]

Rate-Adaptive Neural Networks for Spatial Multiplexers

Authors: Suhas Lohit, Rajhans Singh, Kuldeep Kulkarni, Pavan Turaga

Abstract: In resource-constrained environments, one can employ spatial multiplexing cameras to acquire a small number of measurements of a scene, and perform effective reconstruction or high-level inference using purely data-driven neural networks. However, once trained, the measurement matrix and the network are valid only for a single measurement rate (MR) chosen at training time. To overcome this drawbac… ▽ More In resource-constrained environments, one can employ spatial multiplexing cameras to acquire a small number of measurements of a scene, and perform effective reconstruction or high-level inference using purely data-driven neural networks. However, once trained, the measurement matrix and the network are valid only for a single measurement rate (MR) chosen at training time. To overcome this drawback, we answer the following question: How can we jointly design the measurement operator and the reconstruction/inference network so that the system can operate over a \textit{range} of MRs? To this end, we present a novel training algorithm, for learning \textbf{\textit{rate-adaptive}} networks. Using standard datasets, we demonstrate that, when tested over a range of MRs, a rate-adaptive network can provide high quality reconstruction over a the entire range, resulting in up to about 15 dB improvement over previous methods, where the network is valid for only one MR. We demonstrate the effectiveness of our approach for sample-efficient object tracking where video frames are acquired at dynamically varying MRs. We also extend this algorithm to learn the measurement operator in conjunction with image recognition networks. Experiments on MNIST and CIFAR-10 confirm the applicability of our algorithm to different tasks. △ Less

Submitted 8 September, 2018; originally announced September 2018.

arXiv:1807.11840 [pdf]

Open Source Android Vulnerability Detection Tools: A Survey

Authors: Keyur Kulkarni, Ahmad Y Javaid

Abstract: Since last decade, smartphones have become an integral part of everyone's life. Having the ability to handle many useful and attractive applications, smartphones sport flawless functionality and small sizes leading to their exponential growth. Additionally, due to the huge user base and a wide range of functionalities, these mobile platforms have become a popular source of information to the publi… ▽ More Since last decade, smartphones have become an integral part of everyone's life. Having the ability to handle many useful and attractive applications, smartphones sport flawless functionality and small sizes leading to their exponential growth. Additionally, due to the huge user base and a wide range of functionalities, these mobile platforms have become a popular source of information to the public through several Apps provided by the DHS Citizen Application Directory. Such wide audience to this platform is also making it a huge target for cyber- attacks. While Android, the most popular open source mobile platform, has its base set of permissions to protect the device and resources, it does not provide a security framework to defend against any attack. This paper surveys threat, vulnerability and security analysis tools, which are open source in nature, for the Android platform and systemizes the knowledge of Android security mechanisms. Additionally, a comparison of three popular tools is presented. △ Less

Submitted 31 July, 2018; originally announced July 2018.

arXiv:1806.03379 [pdf, other]

CS-VQA: Visual Question Answering with Compressively Sensed Images

Authors: Li-Chi Huang, Kuldeep Kulkarni, Anik Jha, Suhas Lohit, Suren Jayasuriya, Pavan Turaga

Abstract: Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvabl… ▽ More Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvable in the compressed domain. Our results show that there is nominal degradation in VQA performance when using compressive measurements, but that accuracy can be recovered when VQA pipelines are used in conjunction with state-of-the-art deep neural networks for CS reconstruction. The results presented yield important implications for resource-constrained VQA applications. △ Less

Submitted 8 June, 2018; originally announced June 2018.

Comments: 5 pages, 2 figures, accepted to ICIP 2018

MSC Class: 68

arXiv:1802.01722 [pdf, other]

Compressive Light Field Reconstructions using Deep Learning

Authors: Mayank Gupta, Arjun Jauhari, Kuldeep Kulkarni, Suren Jayasuriya, Alyosha Molnar, Pavan Turaga

Abstract: Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a… ▽ More Light field imaging is limited in its computational processing demands of high sampling for both spatial and angular dimensions. Single-shot light field cameras sacrifice spatial resolution to sample angular viewpoints, typically by multiplexing incoming rays onto a 2D sensor array. While this resolution can be recovered using compressive sensing, these iterative solutions are slow in processing a light field. We present a deep learning approach using a new, two branch network architecture, consisting jointly of an autoencoder and a 4D CNN, to recover a high resolution 4D light field from a single coded 2D image. This network decreases reconstruction time significantly while achieving average PSNR values of 26-32 dB on a variety of light fields. In particular, reconstruction time is decreased from 35 minutes to 6.7 minutes as compared to the dictionary method for equivalent visual quality. These reconstructions are performed at small sampling/compression ratios as low as 8%, allowing for cheaper coded light field cameras. We test our network reconstructions on synthetic light fields, simulated coded measurements of real light fields captured from a Lytro Illum camera, and real coded images from a custom CMOS diffractive light field camera. The combination of compressive light field capture with deep learning allows the potential for real-time light field video acquisition systems in the future. △ Less

Submitted 5 February, 2018; originally announced February 2018.

Comments: Published at CCD 2017 workshop held in conjunction with CVPR 2017

arXiv:1708.04669 [pdf, other]

Convolutional Neural Networks for Non-iterative Reconstruction of Compressively Sensed Images

Authors: Suhas Lohit, Kuldeep Kulkarni, Ronan Kerviche, Pavan Turaga, Amit Ashok

Abstract: Traditional algorithms for compressive sensing recovery are computationally expensive and are ineffective at low measurement rates. In this work, we propose a data driven non-iterative algorithm to overcome the shortcomings of earlier iterative algorithms. Our solution, ReconNet, is a deep neural network, whose parameters are learned end-to-end to map block-wise compressive measurements of the sce… ▽ More Traditional algorithms for compressive sensing recovery are computationally expensive and are ineffective at low measurement rates. In this work, we propose a data driven non-iterative algorithm to overcome the shortcomings of earlier iterative algorithms. Our solution, ReconNet, is a deep neural network, whose parameters are learned end-to-end to map block-wise compressive measurements of the scene to the desired image blocks. Reconstruction of an image becomes a simple forward pass through the network and can be done in real-time. We show empirically that our algorithm yields reconstructions with higher PSNRs compared to iterative algorithms at low measurement rates and in presence of measurement noise. We also propose a variant of ReconNet which uses adversarial loss in order to further improve reconstruction quality. We discuss how adding a fully connected layer to the existing ReconNet architecture allows for jointly learning the measurement matrix and the reconstruction algorithm in a single network. Experiments on real data obtained from a block compressive imager show that our networks are robust to unseen sensor noise. Finally, through an experiment in object tracking, we show that even at very low measurement rates, reconstructions using our algorithm possess rich semantic content that can be used for high level inference. △ Less

Submitted 16 August, 2017; v1 submitted 15 August, 2017; originally announced August 2017.

arXiv:1707.04061 [pdf, other]

Automatic Recognition of Facial Displays of Unfelt Emotions

Authors: Kaustubh Kulkarni, Ciprian Adrian Corneanu, Ikechukwu Ofodile, Sergio Escalera, Xavier Baro, Sylwia Hyniewska, Juri Allik, Gholamreza Anbarjafari

Abstract: Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposin… ▽ More Humans modify their facial expressions in order to communicate their internal states and sometimes to mislead observers regarding their true emotional states. Evidence in experimental psychology shows that discriminative facial responses are short and subtle. This suggests that such behavior would be easier to distinguish when captured in high resolution at an increased frame rate. We are proposing SASE-FE, the first dataset of facial expressions that are either congruent or incongruent with underlying emotion states. We show that overall the problem of recognizing whether facial movements are expressions of authentic emotions or not can be successfully addressed by learning spatio-temporal representations of the data. For this purpose, we propose a method that aggregates features along fiducial trajectories in a deeply learnt space. Performance of the proposed model shows that on average it is easier to distinguish among genuine facial expressions of emotion than among unfelt facial expressions of emotion and that certain emotion pairs such as contempt and disgust are more difficult to distinguish than the rest. Furthermore, the proposed methodology improves state of the art results on CK+ and OULU-CASIA datasets for video emotion recognition, and achieves competitive results when classifying facial action units on BP4D datase. △ Less

Submitted 9 January, 2018; v1 submitted 13 July, 2017; originally announced July 2017.

arXiv:1702.07099 [pdf, other]

doi 10.1145/3041021.3054234

Carina: Interactive Million-Node Graph Visualization using Web Browser Technologies

Authors: Dezhi Fang, Matthew Keezer, Jacob Williams, Kshitij Kulkarni, Robert Pienta, Duen Horng Chau

Abstract: We are working on a scalable, interactive visualization system, called Carina, for people to explore million-node graphs. By using latest web browser technologies, Carina offers fast graph rendering via WebGL, and works across desktop (via Electron) and mobile platforms. Different from most existing graph visualization tools, Carina does not store the full graph in RAM, enabling it to work with gr… ▽ More We are working on a scalable, interactive visualization system, called Carina, for people to explore million-node graphs. By using latest web browser technologies, Carina offers fast graph rendering via WebGL, and works across desktop (via Electron) and mobile platforms. Different from most existing graph visualization tools, Carina does not store the full graph in RAM, enabling it to work with graphs with up to 69M edges. We are working to improve and open-source Carina, to offer researchers and practitioners a new, scalable way to explore and visualize large graph datasets. △ Less

Submitted 24 February, 2017; v1 submitted 23 February, 2017; originally announced February 2017.

arXiv:1701.06204 [pdf, other]

doi 10.1109/TVT.2017.2664502

On Optimal Spectrum Access of Cognitive Relay With Finite Packet Buffer

Authors: Kedar Kulkarni, Adrish Banerjee

Abstract: We investigate a cognitive radio system where secondary user (SU) relays primary user (PU) packets using two-phase relaying. SU transmits its own packets with some access probability in relaying phase using time sharing. PU and SU have queues of finite capacity which results in packet loss when the queues are full. Utilizing knowledge of relay queue state, SU aims to maximize its packet throughput… ▽ More We investigate a cognitive radio system where secondary user (SU) relays primary user (PU) packets using two-phase relaying. SU transmits its own packets with some access probability in relaying phase using time sharing. PU and SU have queues of finite capacity which results in packet loss when the queues are full. Utilizing knowledge of relay queue state, SU aims to maximize its packet throughput while kee** packet loss probability of PU below a threshold. By exploiting structure of the problem, we formulate it as a linear program and find optimal access policy of SU. We also propose low complexity sub-optimal access policies, namely constant probability transmission and step transmission. Numerical results are presented to compare performance of proposed methods and study effect of queue sizes on packet throughput. △ Less

Submitted 22 January, 2017; originally announced January 2017.

Comments: Accepted for publication in IEEE Transactions on Vehicular Technology

arXiv:1701.04901 [pdf, ps, other]

doi 10.1016/j.phycom.2017.01.003.

Multi-channel Sensing And Resource Allocation in Energy Constrained Cognitive Radio Networks

Authors: Kedar Kulkarni, Adrish Banerjee

Abstract: We consider a cognitive radio network in a multi-channel licensed environment. Secondary user transmits in a channel if the channel is sensed to be vacant. This results in a tradeoff between sensing time and transmission time. When secondary users are energy constrained, energy available for transmission is less if more energy is used in sensing. This gives rise to an energy tradeoff. For multiple… ▽ More We consider a cognitive radio network in a multi-channel licensed environment. Secondary user transmits in a channel if the channel is sensed to be vacant. This results in a tradeoff between sensing time and transmission time. When secondary users are energy constrained, energy available for transmission is less if more energy is used in sensing. This gives rise to an energy tradeoff. For multiple primary channels, secondary users must decide appropriate sensing time and transmission power in each channel to maximize average aggregate-bit throughput in each frame duration while ensuring quality-of-service of primary users. Considering time and energy as limited resources, we formulate this problem as a resource allocation problem. Initially a single secondary user scenario is considered and solution is obtained using decomposition and alternating optimization techniques. Later we extend the analysis for the case of multiple secondary users. Simulation results are presented to study effect of channel occupancy, fading and energy availability on performance of proposed method. △ Less

Submitted 17 January, 2017; originally announced January 2017.

Comments: Accepted for publication in Physical Communication

arXiv:1607.08394 [pdf, ps, other]

doi 10.1109/TCOMM.2016.2597857

On Stable Throughput of Cognitive Radio Networks With Cooperating Secondary Users

Authors: Kedar Kulkarni, Adrish Banerjee

Abstract: In this paper, we study cooperative cognitive radio networks consisting of a primary user and multiple secondary users. Secondary users transmit only when primary user is sensed as silent and may interfere with primary transmission due to imperfect sensing. When primary activity is sensed correctly, secondary users cooperate with primary user by assisting retransmission of failed packets of primar… ▽ More In this paper, we study cooperative cognitive radio networks consisting of a primary user and multiple secondary users. Secondary users transmit only when primary user is sensed as silent and may interfere with primary transmission due to imperfect sensing. When primary activity is sensed correctly, secondary users cooperate with primary user by assisting retransmission of failed packets of primary user. We analyze packet throughput of primary and secondary users for three variations of proposed cooperation method. Signal flow graph (SFG) based approach is employed to obtain closed form expressions of packet throughput. The analysis is done for two cases; individual sensing and cooperative sensing. Further, we characterize optimal transmission probability of secondary users that maximizes individual secondary packet throughput kee** all queues in the system stable. Results present a comparison of throughput performance of proposed cooperation methods under different scenarios and show their benefits for both primary as well as secondary user throughput. △ Less

Submitted 1 August, 2016; v1 submitted 28 July, 2016; originally announced July 2016.

Comments: Accepted for publication in IEEE Transactions on Communications

arXiv:1607.03240 [pdf, other]

Weakly Supervised Learning of Heterogeneous Concepts in Videos

Authors: Sohil Shah, Kuldeep Kulkarni, Arijit Biswas, Ankit Gandhi, Om Deshmukh, Larry Davis

Abstract: Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a g… ▽ More Typical textual descriptions that accompany online videos are 'weak': i.e., they mention the main concepts in the video but not their corresponding spatio-temporal locations. The concepts in the description are typically heterogeneous (e.g., objects, persons, actions). Certain location constraints on these concepts can also be inferred from the description. The goal of this paper is to present a generalization of the Indian Buffet Process (IBP) that can (a) systematically incorporate heterogeneous concepts in an integrated framework, and (b) enforce location constraints, for efficient classification and localization of the concepts in the videos. Finally, we develop posterior inference for the proposed formulation using mean-field variational approximation. Comparative evaluations on the Casablanca and the A2D datasets show that the proposed approach significantly outperforms other state-of-the-art techniques: 24% relative improvement for pairwise concept classification in the Casablanca dataset and 9% relative improvement for localization in the A2D dataset as compared to the most competitive baseline. △ Less

Submitted 12 July, 2016; originally announced July 2016.

Comments: To appear at ECCV 2016

arXiv:1601.07258 [pdf, other]

Fast Integral Image Estimation at 1% measurement rate

Authors: Kuldeep Kulkarni, Pavan Turaga

Abstract: We propose a framework called ReFInE to directly obtain integral image estimates from a very small number of spatially multiplexed measurements of the scene without iterative reconstruction of any auxiliary image, and demonstrate their practical utility in visual object tracking. Specifically, we design measurement matrices which are tailored to facilitate extremely fast estimation of the integral… ▽ More We propose a framework called ReFInE to directly obtain integral image estimates from a very small number of spatially multiplexed measurements of the scene without iterative reconstruction of any auxiliary image, and demonstrate their practical utility in visual object tracking. Specifically, we design measurement matrices which are tailored to facilitate extremely fast estimation of the integral image, by using a single-shot linear operation on the measured vector. Leveraging a prior model for the images, we formulate a nuclear norm minimization problem with second order conic constraints to jointly obtain the measurement matrix and the linear operator. Through qualitative and quantitative experiments, we show that high quality integral image estimates can be obtained using our framework at very low measurement rates. Further, on a standard dataset of 50 videos, we present object tracking results which are comparable to the state-of-the-art methods, even at an extremely low measurement rate of 1%. △ Less

Submitted 26 January, 2016; originally announced January 2016.

Comments: Submitted to TPAMI

arXiv:1601.06892 [pdf, other]

ReconNet: Non-Iterative Reconstruction of Images from Compressively Sensed Random Measurements

Authors: Kuldeep Kulkarni, Suhas Lohit, Pavan Turaga, Ronan Kerviche, Amit Ashok

Abstract: The goal of this paper is to present a non-iterative and more importantly an extremely fast algorithm to reconstruct images from compressively sensed (CS) random measurements. To this end, we propose a novel convolutional neural network (CNN) architecture which takes in CS measurements of an image as input and outputs an intermediate reconstruction. We call this network, ReconNet. The intermediate… ▽ More The goal of this paper is to present a non-iterative and more importantly an extremely fast algorithm to reconstruct images from compressively sensed (CS) random measurements. To this end, we propose a novel convolutional neural network (CNN) architecture which takes in CS measurements of an image as input and outputs an intermediate reconstruction. We call this network, ReconNet. The intermediate reconstruction is fed into an off-the-shelf denoiser to obtain the final reconstructed image. On a standard dataset of images we show significant improvements in reconstruction results (both in terms of PSNR and time complexity) over state-of-the-art iterative CS reconstruction algorithms at various measurement rates. Further, through qualitative experiments on real data collected using our block single pixel camera (SPC), we show that our network is highly robust to sensor noise and can recover visually better quality images than competitive algorithms at extremely low sensing rates of 0.1 and 0.04. To demonstrate that our algorithm can recover semantically informative images even at a low measurement rate of 0.01, we present a very robust proof of concept real-time visual tracking application. △ Less

Submitted 7 March, 2016; v1 submitted 26 January, 2016; originally announced January 2016.

Comments: Accepted at IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016

arXiv:1501.04367 [pdf, other]

Reconstruction-free action inference from compressive imagers

Authors: Kuldeep Kulkarni, Pavan Turaga

Abstract: Persistent surveillance from camera networks, such as at parking lots, UAVs, etc., often results in large amounts of video data, resulting in significant challenges for inference in terms of storage, communication and computation. Compressive cameras have emerged as a potential solution to deal with the data deluge issues in such applications. However, inference tasks such as action recognition re… ▽ More Persistent surveillance from camera networks, such as at parking lots, UAVs, etc., often results in large amounts of video data, resulting in significant challenges for inference in terms of storage, communication and computation. Compressive cameras have emerged as a potential solution to deal with the data deluge issues in such applications. However, inference tasks such as action recognition require high quality features which implies reconstructing the original video data. Much work in compressive sensing (CS) theory is geared towards solving the reconstruction problem, where state-of-the-art methods are computationally intensive and provide low-quality results at high compression rates. Thus, reconstruction-free methods for inference are much desired. In this paper, we propose reconstruction-free methods for action recognition from compressive cameras at high compression ratios of 100 and above. Recognizing actions directly from CS measurements requires features which are mostly nonlinear and thus not easily applicable. This leads us to search for such properties that are preserved in compressive measurements. To this end, we propose the use of spatio-temporal smashed filters, which are compressive domain versions of pixel-domain matched filters. We conduct experiments on publicly available databases and show that one can obtain recognition rates that are comparable to the oracle method in uncompressed setup, even for high compression ratios. △ Less

Submitted 18 January, 2015; originally announced January 2015.

arXiv:1406.0288 [pdf, other]

doi 10.1007/s11263-014-0758-9

Continuous Action Recognition Based on Sequence Alignment

Authors: Kaustubh Kulkarni, Georgios Evangelidis, Jan Cech, Radu Horaud

Abstract: Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time war** (DTW) framework and devise a novel visual alignment technique, namely dynamic frame war** (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a te… ▽ More Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time war** (DTW) framework and devise a novel visual alignment technique, namely dynamic frame war** (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods. △ Less

Submitted 2 June, 2014; originally announced June 2014.

Journal ref: International Journal of Computer Vision 112(1), 90-114, 2015

Showing 1–43 of 43 results for author: Kulkarni, K