Search | arXiv e-print repository

Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing

Authors: Sina Tayebati, Theja Tulabandhula, Amit R. Trivedi

Abstract: In this work, we propose a disruptively frugal LiDAR perception dataflow that generates rather than senses parts of the environment that are either predictable based on the extensive training of the environment or have limited consequence to the overall prediction accuracy. Therefore, the proposed methodology trades off sensing energy with training data for low-power robotics and autonomous naviga… ▽ More In this work, we propose a disruptively frugal LiDAR perception dataflow that generates rather than senses parts of the environment that are either predictable based on the extensive training of the environment or have limited consequence to the overall prediction accuracy. Therefore, the proposed methodology trades off sensing energy with training data for low-power robotics and autonomous navigation to operate frugally with sensors, extending their lifetime on a single battery charge. Our proposed generative pre-training strategy for this purpose, called as radially masked autoencoding (R-MAE), can also be readily implemented in a typical LiDAR system by selectively activating and controlling the laser power for randomly generated angular regions during on-field operations. Our extensive evaluations show that pre-training with R-MAE enables focusing on the radial segments of the data, thereby capturing spatial relationships and distances between objects more effectively than conventional procedures. Therefore, the proposed methodology not only reduces sensing energy but also improves prediction accuracy. For example, our extensive evaluations on Waymo, nuScenes, and KITTI datasets show that the approach achieves over a 5% average precision improvement in detection tasks across datasets and over a 4% accuracy improvement in transferring domains from Waymo and nuScenes to KITTI. In 3D object detection, it enhances small object detection by up to 4.37% in AP at moderate difficulty levels in the KITTI dataset. Even with 90% radial masking, it surpasses baseline models by up to 5.59% in mAP/mAPH across all object classes in the Waymo dataset. Additionally, our method achieves up to 3.17% and 2.31% improvements in mAP and NDS, respectively, on the nuScenes dataset, demonstrating its effectiveness with both single and fused LiDAR-camera modalities. https://github.com/sinatayebati/Radial_MAE. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2403.02745 [pdf, other]

CURATRON: Complete Robust Preference Data for Robust Alignment of Large Language Models

Authors: Son The Nguyen, Niranjan Uma Naresh, Theja Tulabandhula

Abstract: This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), with a focus on the issues of incomplete and corrupted data in preference datasets. We propose a novel method for robustly and completely recalibrating values within these datasets to enhance LLMs resilience against the issues. In particular, we devise a guaranteed polynomia… ▽ More This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), with a focus on the issues of incomplete and corrupted data in preference datasets. We propose a novel method for robustly and completely recalibrating values within these datasets to enhance LLMs resilience against the issues. In particular, we devise a guaranteed polynomial time ranking algorithm that robustifies several existing models, such as the classic Bradley--Terry--Luce (BTL) (Bradley and Terry, 1952) model and certain generalizations of it. To the best of our knowledge, our present work is the first to propose an algorithm that provably recovers an ε-optimal ranking with high probability while allowing as large as O(n) perturbed pairwise comparison results per model response. Furthermore, we show robust recovery results in the partially observed setting. Our experiments confirm that our algorithms handle adversarial noise and unobserved comparisons well in both general and LLM preference dataset settings. This work contributes to the development and scaling of more reliable and ethically aligned AI models by equip** the dataset curation pipeline with the ability to handle missing and maliciously manipulated inputs. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00822 [pdf, other]

InteraRec: Screenshot Based Recommendations Using Multimodal Large Language Models

Authors: Saketh Reddy Karra, Theja Tulabandhula

Abstract: Weblogs, comprised of records detailing user activities on any website, offer valuable insights into user preferences, behavior, and interests. Numerous recommendation algorithms, employing strategies such as collaborative filtering, content-based filtering, and hybrid methods, leverage the data mined through these weblogs to provide personalized recommendations to users. Despite the abundance of… ▽ More Weblogs, comprised of records detailing user activities on any website, offer valuable insights into user preferences, behavior, and interests. Numerous recommendation algorithms, employing strategies such as collaborative filtering, content-based filtering, and hybrid methods, leverage the data mined through these weblogs to provide personalized recommendations to users. Despite the abundance of information available in these weblogs, identifying and extracting pertinent information and key features from them necessitate extensive engineering endeavors. The intricate nature of the data also poses a challenge for interpretation, especially for non-experts. In this study, we introduce a sophisticated and interactive recommendation framework denoted as InteraRec, which diverges from conventional approaches that exclusively depend on weblogs for recommendation generation. InteraRec framework captures high-frequency screenshots of web pages as users navigate through a website. Leveraging state-of-the-art multimodal large language models (MLLMs), it extracts valuable insights into user preferences from these screenshots by generating a textual summary based on predefined keywords. Subsequently, an LLM-integrated optimization setup utilizes this summary to generate tailored recommendations. Through our experiments, we demonstrate the effectiveness of InteraRec in providing users with valuable and personalized offerings. Furthermore, we explore the integration of session-based recommendation systems into the InteraRec framework, aiming to enhance its overall performance. Finally, we curate a new dataset comprising of screenshots from product web pages on the Amazon website for the validation of the InteraRec framework. Detailed experiments demonstrate the efficacy of the InteraRec framework in delivering valuable and personalized recommendations tailored to individual user preferences. △ Less

Submitted 15 June, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

arXiv:2402.07107 [pdf, other]

Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning

Authors: Alex Christopher Stutts, Danilo Erricolo, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in… ▽ More We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of $\textit{global}$ uncertainty as opposed to $\textit{local}$ estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness. △ Less

Submitted 3 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2312.06826 [pdf, other]

User Friendly and Adaptable Discriminative AI: Using the Lessons from the Success of LLMs and Image Generation Models

Authors: Son The Nguyen, Theja Tulabandhula, Mary Beth Watson-Manheim

Abstract: While there is significant interest in using generative AI tools as general-purpose models for specific ML applications, discriminative models are much more widely deployed currently. One of the key shortcomings of these discriminative AI tools that have been already deployed is that they are not adaptable and user-friendly compared to generative AI tools (e.g., GPT4, Stable Diffusion, Bard, etc.)… ▽ More While there is significant interest in using generative AI tools as general-purpose models for specific ML applications, discriminative models are much more widely deployed currently. One of the key shortcomings of these discriminative AI tools that have been already deployed is that they are not adaptable and user-friendly compared to generative AI tools (e.g., GPT4, Stable Diffusion, Bard, etc.), where a non-expert user can iteratively refine model inputs and give real-time feedback that can be accounted for immediately, allowing users to build trust from the start. Inspired by this emerging collaborative workflow, we develop a new system architecture that enables users to work with discriminative models (such as for object detection, sentiment classification, etc.) in a fashion similar to generative AI tools, where they can easily provide immediate feedback as well as adapt the deployed models as desired. Our approach has implications on improving trust, user-friendliness, and adaptability of these versatile but traditional prediction models. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.12241 [pdf, other]

InteraSSort: Interactive Assortment Planning Using Large Language Models

Authors: Saketh Reddy Karra, Theja Tulabandhula

Abstract: Assortment planning, integral to multiple commercial offerings, is a key problem studied in e-commerce and retail settings. Numerous variants of the problem along with their integration into business solutions have been thoroughly investigated in the existing literature. However, the nuanced complexities of in-store planning and a lack of optimization proficiency among store planners with strong d… ▽ More Assortment planning, integral to multiple commercial offerings, is a key problem studied in e-commerce and retail settings. Numerous variants of the problem along with their integration into business solutions have been thoroughly investigated in the existing literature. However, the nuanced complexities of in-store planning and a lack of optimization proficiency among store planners with strong domain expertise remain largely overlooked. These challenges frequently necessitate collaborative efforts with multiple stakeholders which often lead to prolonged decision-making processes and significant delays. To mitigate these challenges and capitalize on the advancements of Large Language Models (LLMs), we propose an interactive assortment planning framework, InteraSSort that augments LLMs with optimization tools to assist store planners in making decisions through interactive conversations. Specifically, we develop a solution featuring a user-friendly interface that enables users to express their optimization objectives as input text prompts to InteraSSort and receive tailored optimized solutions as output. Our framework extends beyond basic functionality by enabling the inclusion of additional constraints through interactive conversation, facilitating precise and highly customized decision-making. Extensive experiments demonstrate the effectiveness of our framework and potential extensions to a broad range of operations management challenges. △ Less

Submitted 9 January, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2309.11069 [pdf, other]

Dynamic Tiling: A Model-Agnostic, Adaptive, Scalable, and Inference-Data-Centric Approach for Efficient and Accurate Small Object Detection

Authors: Son The Nguyen, Theja Tulabandhula, Duy Nguyen

Abstract: We introduce Dynamic Tiling, a model-agnostic, adaptive, and scalable approach for small object detection, anchored in our inference-data-centric philosophy. Dynamic Tiling starts with non-overlap** tiles for initial detections and utilizes dynamic overlap** rates along with a tile minimizer. This dual approach effectively resolves fragmented objects, improves detection accuracy, and minimizes… ▽ More We introduce Dynamic Tiling, a model-agnostic, adaptive, and scalable approach for small object detection, anchored in our inference-data-centric philosophy. Dynamic Tiling starts with non-overlap** tiles for initial detections and utilizes dynamic overlap** rates along with a tile minimizer. This dual approach effectively resolves fragmented objects, improves detection accuracy, and minimizes computational overhead by reducing the number of forward passes through the object detection model. Adaptable to a variety of operational environments, our method negates the need for laborious recalibration. Additionally, our large-small filtering mechanism boosts the detection quality across a range of object sizes. Overall, Dynamic Tiling outperforms existing model-agnostic uniform crop** methods, setting new benchmarks for efficiency and accuracy. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.11018 [pdf, other]

Conformalized Multimodal Uncertainty Regression and Reasoning

Authors: Domenico Parente, Nastaran Darabi, Alex C. Stutts, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: This paper introduces a lightweight uncertainty estimator capable of predicting multimodal (disjoint) uncertainty bounds by integrating conformal prediction with a deep-learning regressor. We specifically discuss its application for visual odometry (VO), where environmental features such as flying domain symmetries and sensor measurements under ambiguities and occlusion can result in multimodal un… ▽ More This paper introduces a lightweight uncertainty estimator capable of predicting multimodal (disjoint) uncertainty bounds by integrating conformal prediction with a deep-learning regressor. We specifically discuss its application for visual odometry (VO), where environmental features such as flying domain symmetries and sensor measurements under ambiguities and occlusion can result in multimodal uncertainties. Our simulation results show that uncertainty estimates in our framework adapt sample-wise against challenging operating conditions such as pronounced noise, limited training data, and limited parametric size of the prediction model. We also develop a reasoning framework that leverages these robust uncertainty estimates and incorporates optical flow-based reasoning to improve prediction prediction accuracy. Thus, by appropriately accounting for predictive uncertainties of data-driven learning and closing their estimation loop via rule-based reasoning, our methodology consistently surpasses conventional deep learning approaches on all these challenging scenarios--pronounced noise, limited training data, and limited model size-reducing the prediction error by 2-3x. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.11006 [pdf, other]

STARNet: Sensor Trustworthiness and Anomaly Recognition via Approximated Likelihood Regret for Robust Edge Autonomy

Authors: Nastaran Darabi, Sina Tayebati, Sureshkumar S., Sathya Ravi, Theja Tulabandhula, Amit R. Trivedi

Abstract: Complex sensors such as LiDAR, RADAR, and event cameras have proliferated in autonomous robotics to enhance perception and understanding of the environment. Meanwhile, these sensors are also vulnerable to diverse failure mechanisms that can intricately interact with their operation environment. In parallel, the limited availability of training data on complex sensors also affects the reliability o… ▽ More Complex sensors such as LiDAR, RADAR, and event cameras have proliferated in autonomous robotics to enhance perception and understanding of the environment. Meanwhile, these sensors are also vulnerable to diverse failure mechanisms that can intricately interact with their operation environment. In parallel, the limited availability of training data on complex sensors also affects the reliability of their deep learning-based prediction flow, where their prediction models can fail to generalize to environments not adequately captured in the training set. To address these reliability concerns, this paper introduces STARNet, a Sensor Trustworthiness and Anomaly Recognition Network designed to detect untrustworthy sensor streams that may arise from sensor malfunctions and/or challenging environments. We specifically benchmark STARNet on LiDAR and camera data. STARNet employs the concept of approximated likelihood regret, a gradient-free framework tailored for low-complexity hardware, especially those with only fixed-point precision capabilities. Through extensive simulations, we demonstrate the efficacy of STARNet in detecting untrustworthy sensor streams in unimodal and multimodal settings. In particular, the network shows superior performance in addressing internal sensor failures, such as cross-sensor interference and crosstalk. In diverse test scenarios involving adverse weather and sensor malfunctions, we show that STARNet enhances prediction accuracy by approximately 10% by filtering out untrustworthy sensor streams. STARNet is publicly available at \url{https://github.com/sinatayebati/STARNet}. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.09593 [pdf, other]

Mutual Information-calibrated Conformal Feature Fusion for Uncertainty-Aware Multimodal 3D Object Detection at the Edge

Authors: Alex C. Stutts, Danilo Erricolo, Sathya Ravi, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: In the expanding landscape of AI-enabled robotics, robust quantification of predictive uncertainties is of great importance. Three-dimensional (3D) object detection, a critical robotics operation, has seen significant advancements; however, the majority of current works focus only on accuracy and ignore uncertainty quantification. Addressing this gap, our novel study integrates the principles of c… ▽ More In the expanding landscape of AI-enabled robotics, robust quantification of predictive uncertainties is of great importance. Three-dimensional (3D) object detection, a critical robotics operation, has seen significant advancements; however, the majority of current works focus only on accuracy and ignore uncertainty quantification. Addressing this gap, our novel study integrates the principles of conformal inference (CI) with information theoretic measures to perform lightweight, Monte Carlo-free uncertainty estimation within a multimodal framework. Through a multivariate Gaussian product of the latent variables in a Variational Autoencoder (VAE), features from RGB camera and LiDAR sensor data are fused to improve the prediction accuracy. Normalized mutual information (NMI) is leveraged as a modulator for calibrating uncertainty bounds derived from CI based on a weighted loss function. Our simulation results show an inverse correlation between inherent predictive uncertainty and NMI throughout the model's training. The framework demonstrates comparable or better performance in KITTI 3D object detection benchmarks to similar methods that are not uncertainty-aware, making it suitable for real-time edge robotics. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.14182 [pdf, other]

Generative AI for Business Strategy: Using Foundation Models to Create Business Strategy Tools

Authors: Son The Nguyen, Theja Tulabandhula

Abstract: Generative models (foundation models) such as LLMs (large language models) are having a large impact on multiple fields. In this work, we propose the use of such models for business decision making. In particular, we combine unstructured textual data sources (e.g., news data) with multiple foundation models (namely, GPT4, transformer-based Named Entity Recognition (NER) models and Entailment-based… ▽ More Generative models (foundation models) such as LLMs (large language models) are having a large impact on multiple fields. In this work, we propose the use of such models for business decision making. In particular, we combine unstructured textual data sources (e.g., news data) with multiple foundation models (namely, GPT4, transformer-based Named Entity Recognition (NER) models and Entailment-based Zero-shot Classifiers (ZSC)) to derive IT (information technology) artifacts in the form of a (sequence of) signed business networks. We posit that such artifacts can inform business stakeholders about the state of the market and their own positioning as well as provide quantitative insights into improving their future outlook. △ Less

Submitted 27 August, 2023; originally announced August 2023.

arXiv:2303.02207 [pdf, other]

Lightweight, Uncertainty-Aware Conformalized Visual Odometry

Authors: Alex C. Stutts, Danilo Erricolo, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: Data-driven visual odometry (VO) is a critical subroutine for autonomous edge robotics, and recent progress in the field has produced highly accurate point predictions in complex environments. However, emerging autonomous edge robotics devices like insect-scale drones and surgical robots lack a computationally efficient framework to estimate VO's predictive uncertainties. Meanwhile, as edge roboti… ▽ More Data-driven visual odometry (VO) is a critical subroutine for autonomous edge robotics, and recent progress in the field has produced highly accurate point predictions in complex environments. However, emerging autonomous edge robotics devices like insect-scale drones and surgical robots lack a computationally efficient framework to estimate VO's predictive uncertainties. Meanwhile, as edge robotics continue to proliferate into mission-critical application spaces, awareness of model's the predictive uncertainties has become crucial for risk-aware decision-making. This paper addresses this challenge by presenting a novel, lightweight, and statistically robust framework that leverages conformal inference (CI) to extract VO's uncertainty bands. Our approach represents the uncertainties using flexible, adaptable, and adjustable prediction intervals that, on average, guarantee the inclusion of the ground truth across all degrees of freedom (DOF) of pose estimation. We discuss the architectures of generative deep neural networks for estimating multivariate uncertainty bands along with point (mean) prediction. We also present techniques to improve the uncertainty estimation accuracy, such as leveraging Monte Carlo dropout (MC-dropout) for data augmentation. Finally, we propose a novel training loss function that combines interval scoring and calibration loss with traditional training metrics--mean-squared error and KL-divergence--to improve uncertainty-aware learning. Our simulation results demonstrate that the presented framework consistently captures true uncertainty in pose estimations across different datasets, estimation models, and applied noise types, indicating its wide applicability. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2210.15559 [pdf, other]

Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction Inaccuracies

Authors: Priyesh Shukla, Sureshkumar S., Alex C. Stutts, Sathya Ravi, Theja Tulabandhula, Amit R. Trivedi

Abstract: We present a novel monocular localization framework by jointly training deep learning-based depth prediction and Bayesian filtering-based pose reasoning. The proposed cross-modal framework significantly outperforms deep learning-only predictions with respect to model scalability and tolerance to environmental variations. Specifically, we show little-to-no degradation of pose accuracy even with ext… ▽ More We present a novel monocular localization framework by jointly training deep learning-based depth prediction and Bayesian filtering-based pose reasoning. The proposed cross-modal framework significantly outperforms deep learning-only predictions with respect to model scalability and tolerance to environmental variations. Specifically, we show little-to-no degradation of pose accuracy even with extremely poor depth estimates from a lightweight depth predictor. Our framework also maintains high pose accuracy in extreme lighting variations compared to standard deep learning, even without explicit domain adaptation. By openly representing the map and intermediate feature maps (such as depth estimates), our framework also allows for faster updates and reusing intermediate predictions for other tasks, such as obstacle avoidance, resulting in much higher resource efficiency. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2207.02328 [pdf, other]

Unified Embeddings of Structural and Functional Connectome via a Function-Constrained Structural Graph Variational Auto-Encoder

Authors: Carlo Amodeo, Igor Fortel, Olusola Ajilore, Liang Zhan, Alex Leow, Theja Tulabandhula

Abstract: Graph theoretical analyses have become standard tools in modeling functional and anatomical connectivity in the brain. With the advent of connectomics, the primary graphs or networks of interest are structural connectome (derived from DTI tractography) and functional connectome (derived from resting-state fMRI). However, most published connectome studies have focused on either structural or functi… ▽ More Graph theoretical analyses have become standard tools in modeling functional and anatomical connectivity in the brain. With the advent of connectomics, the primary graphs or networks of interest are structural connectome (derived from DTI tractography) and functional connectome (derived from resting-state fMRI). However, most published connectome studies have focused on either structural or functional connectome, yet complementary information between them, when available in the same dataset, can be jointly leveraged to improve our understanding of the brain. To this end, we propose a function-constrained structural graph variational autoencoder (FCS-GVAE) capable of incorporating information from both functional and structural connectome in an unsupervised fashion. This leads to a joint low-dimensional embedding that establishes a unified spatial coordinate system for comparing across different subjects. We evaluate our approach using the publicly available OASIS-3 Alzheimer's disease (AD) dataset and show that a variational formulation is necessary to optimally encode functional brain dynamics. Further, the proposed joint embedding approach can more accurately distinguish different patient sub-populations than approaches that do not use complementary connectome information. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2204.12000 [pdf, other]

Estimating the Personality of White-Box Language Models

Authors: Saketh Reddy Karra, Son The Nguyen, Theja Tulabandhula

Abstract: Technology for open-ended language generation, a key application of artificial intelligence, has advanced to a great extent in recent years. Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere, from virtual assistants to conversational bots. While these language models output fluent text, existing research shows that th… ▽ More Technology for open-ended language generation, a key application of artificial intelligence, has advanced to a great extent in recent years. Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere, from virtual assistants to conversational bots. While these language models output fluent text, existing research shows that these models can and do capture human biases. Many of these biases, especially those that could potentially cause harm, are being well-investigated. On the other hand, studies that infer and change human personality traits inherited by these models have been scarce or non-existent. Our work seeks to address this gap by exploring the personality traits of several large-scale language models designed for open-ended text generation and the datasets used for training them. We build on the popular Big Five factors and develop robust methods that quantify the personality traits of these models and their underlying datasets. In particular, we trigger the models with a questionnaire designed for personality assessment and subsequently classify the text responses into quantifiable traits using a Zero-shot classifier. Our estimation scheme sheds light on an important anthropomorphic element found in such AI models and can help stakeholders decide how they should be applied as well as how society could perceive them. Additionally, we examined approaches to alter these personalities, adding to our understanding of how AI models can be adapted to specific contexts. △ Less

Submitted 10 May, 2023; v1 submitted 25 April, 2022; originally announced April 2022.

arXiv:2104.05217 [pdf, other]

ENOS: Energy-Aware Network Operator Search for Hybrid Digital and Compute-in-Memory DNN Accelerators

Authors: Shamma Nasrin, Ahish Shylendra, Yuti Kadakia, Nick Iliev, Wilfred Gomes, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: This work proposes a novel Energy-Aware Network Operator Search (ENOS) approach to address the energy-accuracy trade-offs of a deep neural network (DNN) accelerator. In recent years, novel inference operators have been proposed to improve the computational efficiency of a DNN. Augmenting the operators, their corresponding novel computing modes have also been explored. However, simplification of DN… ▽ More This work proposes a novel Energy-Aware Network Operator Search (ENOS) approach to address the energy-accuracy trade-offs of a deep neural network (DNN) accelerator. In recent years, novel inference operators have been proposed to improve the computational efficiency of a DNN. Augmenting the operators, their corresponding novel computing modes have also been explored. However, simplification of DNN operators invariably comes at the cost of lower accuracy, especially on complex processing tasks. Our proposed ENOS framework allows an optimal layer-wise integration of inference operators and computing modes to achieve the desired balance of energy and accuracy. The search in ENOS is formulated as a continuous optimization problem, solvable using typical gradient descent methods, thereby scalable to larger DNNs with minimal increase in training cost. We characterize ENOS under two settings. In the first setting, for digital accelerators, we discuss ENOS on multiply-accumulate (MAC) cores that can be reconfigured to different operators. ENOS training methods with single and bi-level optimization objectives are discussed and compared. We also discuss a sequential operator assignment strategy in ENOS that only learns the assignment for one layer in one training step, enabling greater flexibility in converging towards the optimal operator allocations. Furthermore, following Bayesian principles, a sampling-based variational mode of ENOS is also presented. ENOS is characterized on popular DNNs ShuffleNet and SqueezeNet on CIFAR10 and CIFAR100. △ Less

Submitted 12 April, 2021; originally announced April 2021.

arXiv:2104.00801 [pdf, other]

Choice-Aware User Engagement Modeling andOptimization on Social Media

Authors: Saketh Reddy Karra, Theja Tulabandhula

Abstract: We address the problem of maximizing user engagement with content (in the form of like, reply, retweet, and retweet with comments)on the Twitter platform. We formulate the engagement forecasting task as a multi-label classification problem that captures choice behavior on an unsupervised clustering of tweet-topics. We propose a neural network architecture that incorporates user engagement history… ▽ More We address the problem of maximizing user engagement with content (in the form of like, reply, retweet, and retweet with comments)on the Twitter platform. We formulate the engagement forecasting task as a multi-label classification problem that captures choice behavior on an unsupervised clustering of tweet-topics. We propose a neural network architecture that incorporates user engagement history and predicts choice conditional on this context. We study the impact of recommend-ing tweets on engagement outcomes by solving an appropriately defined sweet optimization problem based on the proposed model using a large dataset obtained from Twitter. △ Less

Submitted 1 April, 2021; originally announced April 2021.

Comments: 11 pages, 1 figure

arXiv:2102.08247

Probabilistic Localization of Insect-Scale Drones on Floating-Gate Inverter Arrays

Authors: Priyesh Shukla, Ankith Muralidhar, Nick Iliev, Theja Tulabandhula, Sawyer B. Fuller, Amit Ranjan Trivedi

Abstract: We propose a novel compute-in-memory (CIM)-based ultra-low-power framework for probabilistic localization of insect-scale drones. The conventional probabilistic localization approaches rely on the three-dimensional (3D) Gaussian Mixture Model (GMM)-based representation of a 3D map. A GMM model with hundreds of mixture functions is typically needed to adequately learn and represent the intricacies… ▽ More We propose a novel compute-in-memory (CIM)-based ultra-low-power framework for probabilistic localization of insect-scale drones. The conventional probabilistic localization approaches rely on the three-dimensional (3D) Gaussian Mixture Model (GMM)-based representation of a 3D map. A GMM model with hundreds of mixture functions is typically needed to adequately learn and represent the intricacies of the map. Meanwhile, localization using complex GMM map models is computationally intensive. Since insect-scale drones operate under extremely limited area/power budget, continuous localization using GMM models entails much higher operating energy -- thereby, limiting flying duration and/or size of the drone due to a larger battery. Addressing the computational challenges of localization in an insect-scale drone using a CIM approach, we propose a novel framework of 3D map representation using a harmonic mean of "Gaussian-like" mixture (HMGM) model. The likelihood function useful for drone localization can be efficiently implemented by connecting many multi-input inverters in parallel, each programmed with the parameters of the 3D map model represented as HMGM. When the depth measurements are projected to the input of the implementation, the summed current of the inverters emulates the likelihood of the measurement. We have characterized our approach on an RGB-D indoor localization dataset. The average localization error in our approach is $\sim$0.1125 m which is only slightly degraded than software-based evaluation ($\sim$0.08 m). Meanwhile, our localization framework is ultra-low-power, consuming as little as $\sim$17 $μ$W power while processing a depth frame in 1.33 ms over hundred pose hypotheses in the particle-filtering (PF) algorithm used to localize the drone. △ Less

Submitted 24 May, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: Will submit the revised article.

ACM Class: B.7; I.2.9

arXiv:2012.11715 [pdf, other]

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

Authors: Nymisha Bandi, Theja Tulabandhula

Abstract: The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a sequential decision making framework and study the effects of: (a) using data collected under previously employed policies, which may be sub-optimal and constraint-v… ▽ More The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a sequential decision making framework and study the effects of: (a) using data collected under previously employed policies, which may be sub-optimal and constraint-violating, and (b) imposing desired constraints while computing near-optimal policies with this data. Our framework relies on solving a minimax objective, where one player evaluates policies via off-policy estimators, and the opponent uses an online learning strategy to control constraint violations. We extensively investigate various choices for off-policy estimation and their corresponding optimization sub-routines, and quantify their impact on computing constraint-aware allocation policies. Our study shows promising results for constructing such policies when back-tested on historical equities data, under various regimes of operation, dimensionality and constraints. △ Less

Submitted 21 December, 2020; originally announced December 2020.

arXiv:2012.03323 [pdf, other]

KATRec: Knowledge Aware aTtentive Sequential Recommendations

Authors: Mehrnaz Amjadi, Seyed Danial Mohseni Taheri, Theja Tulabandhula

Abstract: Sequential recommendation systems model dynamic preferences of users based on their historical interactions with platforms. Despite recent progress, modeling short-term and long-term behavior of users in such systems is nontrivial and challenging. To address this, we present a solution enhanced by a knowledge graph called KATRec (Knowledge Aware aTtentive sequential Recommendations). KATRec learns… ▽ More Sequential recommendation systems model dynamic preferences of users based on their historical interactions with platforms. Despite recent progress, modeling short-term and long-term behavior of users in such systems is nontrivial and challenging. To address this, we present a solution enhanced by a knowledge graph called KATRec (Knowledge Aware aTtentive sequential Recommendations). KATRec learns the short and long-term interests of users by modeling their sequence of interacted items and leveraging pre-existing side information through a knowledge graph attention network. Our novel knowledge graph-enhanced sequential recommender contains item multi-relations at the entity-level and users' dynamic sequences at the item-level. KATRec improves item representation learning by considering higher-order connections and incorporating them in user preference representation while recommending the next item. Experiments on three public datasets show that KATRec outperforms state-of-the-art recommendation models and demonstrates the importance of modeling both temporal and side information to achieve high-quality recommendations. △ Less

Submitted 5 July, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

arXiv:2011.14430 [pdf]

Deep Reinforcement Learning for Crowdsourced Urban Delivery: System States Characterization, Heuristics-guided Action Choice, and Rule-Interposing Integration

Authors: Tanvir Ahamed, Bo Zou, Nahid Parvez Farazi, Theja Tulabandhula

Abstract: This paper investigates the problem of assigning ship** requests to ad hoc couriers in the context of crowdsourced urban delivery. The ship** requests are spatially distributed each with a limited time window between the earliest time for pickup and latest time for delivery. The ad hoc couriers, termed crowdsourcees, also have limited time availability and carrying capacity. We propose a new d… ▽ More This paper investigates the problem of assigning ship** requests to ad hoc couriers in the context of crowdsourced urban delivery. The ship** requests are spatially distributed each with a limited time window between the earliest time for pickup and latest time for delivery. The ad hoc couriers, termed crowdsourcees, also have limited time availability and carrying capacity. We propose a new deep reinforcement learning (DRL)-based approach to tackling this assignment problem. A deep Q network (DQN) algorithm is trained which entails two salient features of experience replay and target network that enhance the efficiency, convergence, and stability of DRL training. More importantly, this paper makes three methodological contributions: 1) presenting a comprehensive and novel characterization of crowdship** system states that encompasses spatial-temporal and capacity information of crowdsourcees and requests; 2) embedding heuristics that leverage the information offered by the state representation and are based on intuitive reasoning to guide specific actions to take, to preserve tractability and enhance efficiency of training; and 3) integrating rule-interposing to prevent repeated visiting of the same routes and node sequences during routing improvement, thereby further enhancing the training efficiency by accelerating learning. The effectiveness of the proposed approach is demonstrated through extensive numerical analysis. The results show the benefits brought by the heuristics-guided action choice and rule-interposing in DRL training, and the superiority of the proposed approach over existing heuristics in both solution quality, time, and scalability. Besides the potential to improve the efficiency of crowdship** operation planning, the proposed approach also provides a new avenue and generic framework for other problems in the vehicle routing context. △ Less

Submitted 29 November, 2020; originally announced November 2020.

Comments: 50 pages, 17 figures

arXiv:2011.14033 [pdf, other]

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

Authors: Priyank Agrawal, Theja Tulabandhula, Vashist Avadhanula

Abstract: In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of… ▽ More In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of a product is linear in the values of these attributes. We model consumer choice behavior using the widely used Multinomial Logit (MNL) model and consider the decision maker problem of dynamically learning the model parameters while optimizing cumulative revenue over the selling horizon $T$. Though this problem has attracted considerable attention in recent times, many existing methods often involve solving an intractable non-convex optimization problem. Their theoretical performance guarantees depend on a problem-dependent parameter which could be prohibitively large. In particular, existing algorithms for this problem have regret bounded by $O(\sqrt{κd T})$, where $κ$ is a problem-dependent constant that can have an exponential dependency on the number of attributes. In this paper, we propose an optimistic algorithm and show that the regret is bounded by $O(\sqrt{dT} + κ)$, significantly improving the performance over existing methods. Further, we propose a convex relaxation of the optimization step, which allows for tractable decision-making while retaining the favourable regret guarantee. △ Less

Submitted 14 April, 2024; v1 submitted 27 November, 2020; originally announced November 2020.

Comments: Bug fixed

arXiv:2006.10356 [pdf, other]

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

Authors: Priyank Agrawal, Theja Tulabandhula

Abstract: We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where th… ▽ More We study the effect of persistence of engagement on learning in a stochastic multi-armed bandit setting. In advertising and recommendation systems, repetition effect includes a wear-in period, where the user's propensity to reward the platform via a click or purchase depends on how frequently they see the recommendation in the recent past. It also includes a counteracting wear-out period, where the user's propensity to respond positively is dampened if the recommendation was shown too many times recently. Priming effect can be naturally modelled as a temporal constraint on the strategy space, since the reward for the current action depends on historical actions taken by the platform. We provide novel algorithms that achieves sublinear regret in time and the relevant wear-in/wear-out parameters. The effect of priming on the regret upper bound is also additive, and we get back a guarantee that matches popular algorithms such as the UCB1 and Thompson sampling when there is no priming effect. Our work complements recent work on modeling time varying rewards, delays and corruptions in bandits, and extends the usage of rich behavior models in sequential decision making settings. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: Appears in the 36th Conference on Uncertainty in Artificial Intelligence (UAI 2020)

arXiv:2006.08055 [pdf, other]

Multi-Purchase Behavior: Modeling, Estimation and Optimization

Authors: Theja Tulabandhula, Deeksha Sinha, Saketh Reddy Karra, Prasoon Patidar

Abstract: We study the problem of modeling purchase of multiple products and utilizing it to display optimized recommendations for online retailers and e-commerce platforms. We present a parsimonious multi-purchase family of choice models called the Bundle-MVL-K family, and develop a binary search based iterative strategy that efficiently computes optimized recommendations for this model. We establish the… ▽ More We study the problem of modeling purchase of multiple products and utilizing it to display optimized recommendations for online retailers and e-commerce platforms. We present a parsimonious multi-purchase family of choice models called the Bundle-MVL-K family, and develop a binary search based iterative strategy that efficiently computes optimized recommendations for this model. We establish the hardness of computing optimal recommendation sets, and derive several structural properties of the optimal solution that aid in speeding up computation. This is one of the first attempts at operationalizing multi-purchase class of choice models. We show one of the first quantitative links between modeling multiple purchase behavior and revenue gains. The efficacy of our modeling and optimization techniques compared to competing solutions is shown using several real world datasets on multiple metrics such as model fitness, expected revenue gains and run-time reductions. For example, the expected revenue benefit of taking multiple purchases into account is observed to be $\sim5\%$ in relative terms for the Ta Feng and UCI shop** datasets, when compared to the MNL model for instances with $\sim 1500$ products. Additionally, across $6$ real world datasets, the test log-likelihood fits of our models are on average $17\%$ better in relative terms. Our work contributes to the study multi-purchase decisions, analyzing consumer demand and the retailers optimization problem. The simplicity of our models and the iterative nature of our optimization technique allows practitioners meet stringent computational constraints while increasing their revenues in practical recommendation applications at scale, especially in e-commerce platforms and other marketplaces. △ Less

Submitted 5 August, 2023; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: 48 pages. Published in Manufacturing & Service Operations Management 2023

arXiv:2003.04736 [pdf, other]

Optimizing Revenue while showing Relevant Assortments at Scale

Authors: Theja Tulabandhula, Deeksha Sinha, Saketh Karra

Abstract: Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic assortment choices to be made, the optimization process becomes difficult when imposing constraints on the collection of relevant assortments based on insights by store-manager… ▽ More Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic assortment choices to be made, the optimization process becomes difficult when imposing constraints on the collection of relevant assortments based on insights by store-managers and historically well-performing assortments. We design fast and flexible algorithms based on variations of binary search that find the (approximately) optimal assortment in this difficult regime. In particular, we revisit the problem of large-scale assortment optimization under the multinomial logit choice model without any assumptions on the structure of the feasible assortments. We speed up the comparison steps using advances in similarity search in the field of information retrieval/machine learning. For an arbitrary collection of assortments, our algorithms can find a solution in time that is sub-linear in the number of assortments, and for the simpler case of cardinality constraints - linear in the number of items (existing methods are quadratic or worse). Empirical validations using a real world dataset (in addition to experiments using semi-synthetic data based on the Billion Prices dataset and several retail transaction datasets) show that our algorithms are competitive even when the number of items is $\sim 10^5$ ($10\times$ larger instances than previously studied). △ Less

Submitted 1 March, 2021; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: 53 pages, 10 figures

arXiv:2003.02629 [pdf, other]

$MC^2RAM$: Markov Chain Monte Carlo Sampling in SRAM for Fast Bayesian Inference

Authors: Priyesh Shukla, Ahish Shylendra, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: This work discusses the implementation of Markov Chain Monte Carlo (MCMC) sampling from an arbitrary Gaussian mixture model (GMM) within SRAM. We show a novel architecture of SRAM by embedding it with random number generators (RNGs), digital-to-analog converters (DACs), and analog-to-digital converters (ADCs) so that SRAM arrays can be used for high performance Metropolis-Hastings (MH) algorithm-b… ▽ More This work discusses the implementation of Markov Chain Monte Carlo (MCMC) sampling from an arbitrary Gaussian mixture model (GMM) within SRAM. We show a novel architecture of SRAM by embedding it with random number generators (RNGs), digital-to-analog converters (DACs), and analog-to-digital converters (ADCs) so that SRAM arrays can be used for high performance Metropolis-Hastings (MH) algorithm-based MCMC sampling. Most of the expensive computations are performed within the SRAM and can be parallelized for high speed sampling. Our iterative compute flow minimizes data movement during sampling. We characterize power-performance trade-off of our design by simulating on 45 nm CMOS technology. For a two-dimensional, two mixture GMM, the implementation consumes ~ 91 micro-Watts power per sampling iteration and produces 500 samples in 2000 clock cycles on an average at 1 GHz clock frequency. Our study highlights interesting insights on how low-level hardware non-idealities can affect high-level sampling characteristics, and recommends ways to optimally operate SRAM within area/power constraints for high performance sampling. △ Less

Submitted 28 February, 2020; originally announced March 2020.

Comments: This paper has been accepted at the IEEE International Symposium on Circuits and Systems (ISCAS) to be held in May 2020 at Seville, Spain

arXiv:2001.07853 [pdf, other]

Incentivising Exploration and Recommendations for Contextual Bandits with Payments

Authors: Priyank Agrawal, Theja Tulabandhula

Abstract: We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theore… ▽ More We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: 11 pages, 4 figures

arXiv:1911.08518 [pdf, other]

Supported-BinaryNet: Bitcell Array-based Weight Supports for Dynamic Accuracy-Latency Trade-offs in SRAM-based Binarized Neural Network

Authors: Shamma Nasrin, Srikanth Ramakrishna, Theja Tulabandhula, Amit Ranjan Trivedi

Abstract: In this work, we introduce bitcell array-based support parameters to improve the prediction accuracy of SRAM-based binarized neural network (SRAM-BNN). Our approach enhances the training weight space of SRAM-BNN while requiring minimal overheads to a typical design. More flexibility of the weight space leads to higher prediction accuracy in our design. We adapt row digital-to-analog (DAC) converte… ▽ More In this work, we introduce bitcell array-based support parameters to improve the prediction accuracy of SRAM-based binarized neural network (SRAM-BNN). Our approach enhances the training weight space of SRAM-BNN while requiring minimal overheads to a typical design. More flexibility of the weight space leads to higher prediction accuracy in our design. We adapt row digital-to-analog (DAC) converter, and computing flow in SRAM-BNN for bitcell array-based weight supports. Using the discussed interventions, our scheme also allows a dynamic trade-off of accuracy against latency to address dynamic latency constraints in typical real-time applications. We specifically discuss results on two training cases: (i) learning of support parameters on a pre-trained BNN and (ii) simultaneous learning of supports and weight binarization. In the former case, our approach reduces classification error in MNIST by 35.71% (error rate decreases from 1.4% to 0.91%). In the latter case, the error is reduced by 27.65% (error rate decreases from 1.4% to 1.13%). To reduce the power overheads, we propose a dynamic drop out a part of the support parameters. Our architecture can drop out 52% of the bitcell array-based support parameters without losing accuracy. We also characterize our design under varying degrees of process variability in the transistors. △ Less

Submitted 26 November, 2019; v1 submitted 19 November, 2019; originally announced November 2019.

arXiv:1901.07734 [pdf, ps, other]

Thompson Sampling for a Fatigue-aware Online Recommendation System

Authors: Yunjuan Wang, Theja Tulabandhula

Abstract: In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abando… ▽ More In this paper we consider an online recommendation setting, where a platform recommends a sequence of items to its users at every time period. The users respond by selecting one of the items recommended or abandon the platform due to fatigue from seeing less useful items. Assuming a parametric stochastic model of user behavior, which captures positional effects of these items as well as the abandoning behavior of users, the platform's goal is to recommend sequences of items that are competitive to the single best sequence of items in hindsight, without knowing the true user model a priori. Naively applying a stochastic bandit algorithm in this setting leads to an exponential dependence on the number of items. We propose a new Thompson sampling based algorithm with expected regret that is polynomial in the number of items in this combinatorial setting, and performs extremely well in practice. △ Less

Submitted 14 April, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

arXiv:1811.09026 [pdf, other]

Bandits with Temporal Stochastic Constraints

Authors: Priyank Agrawal, Theja Tulabandhula

Abstract: We study the effect of impairment on stochastic multi-armed bandits and develop new ways to mitigate it. Impairment effect is the phenomena where an agent only accrues reward for an action if they have played it at least a few times in the recent past. It is practically motivated by repetition and recency effects in domains such as advertising (here consumer behavior may require repeat actions by… ▽ More We study the effect of impairment on stochastic multi-armed bandits and develop new ways to mitigate it. Impairment effect is the phenomena where an agent only accrues reward for an action if they have played it at least a few times in the recent past. It is practically motivated by repetition and recency effects in domains such as advertising (here consumer behavior may require repeat actions by advertisers) and vocational training (here actions are complex skills that can only be mastered with repetition to get a payoff). Impairment can be naturally modelled as a temporal constraint on the strategy space, and we provide two novel algorithms that achieve sublinear regret, each working with different assumptions on the impairment effect. We introduce a new notion called bucketing in our algorithm design, and show how it can effectively address impairment as well as a broader class of temporal constraints. Our regret bounds explicitly capture the cost of impairment and show that it scales (sub-)linearly with the degree of impairment. Our work complements recent work on modeling delays and corruptions, and we provide experimental evidence supporting our claims. △ Less

Submitted 20 June, 2020; v1 submitted 22 November, 2018; originally announced November 2018.

Comments: An extended abstract appeared in the 4th Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM 2019)

arXiv:1804.08796 [pdf, other]

Block-Structure Based Time-Series Models For Graph Sequences

Authors: Mehrnaz Amjadi, Theja Tulabandhula

Abstract: Although the computational and statistical trade-off for modeling single graphs, for instance, using block models is relatively well understood, extending such results to sequences of graphs has proven to be difficult. In this work, we take a step in this direction by proposing two models for graph sequences that capture: (a) link persistence between nodes across time, and (b) community persistenc… ▽ More Although the computational and statistical trade-off for modeling single graphs, for instance, using block models is relatively well understood, extending such results to sequences of graphs has proven to be difficult. In this work, we take a step in this direction by proposing two models for graph sequences that capture: (a) link persistence between nodes across time, and (b) community persistence of each node across time. In the first model, we assume that the latent community of each node does not change over time, and in the second model we relax this assumption suitably. For both of these proposed models, we provide statistically and computationally efficient inference algorithms, whose unique feature is that they leverage community detection methods that work on single graphs. We also provide experimental results validating the suitability of our models and methods on synthetic and real instances. △ Less

Submitted 18 September, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

Comments: 40 pages, 10 figures

arXiv:1803.01968 [pdf, ps, other]

An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions

Authors: Debjyoti Saharoy, Theja Tulabandhula

Abstract: We propose a new efficient online algorithm to learn the parameters governing the purchasing behavior of a utility maximizing buyer, who responds to prices, in a repeated interaction setting. The key feature of our algorithm is that it can learn even non-linear buyer utility while working with arbitrary price constraints that the seller may impose. This overcomes a major shortcoming of previous ap… ▽ More We propose a new efficient online algorithm to learn the parameters governing the purchasing behavior of a utility maximizing buyer, who responds to prices, in a repeated interaction setting. The key feature of our algorithm is that it can learn even non-linear buyer utility while working with arbitrary price constraints that the seller may impose. This overcomes a major shortcoming of previous approaches, which use unrealistic prices to learn these parameters making them unsuitable in practice. △ Less

Submitted 5 March, 2018; originally announced March 2018.

arXiv:1710.03275 [pdf, ps, other]

Privacy-preserving Targeted Advertising

Authors: Theja Tulabandhula, Shailesh Vaya, Aritra Dhar

Abstract: Recommendation systems form the center piece of a rapidly growing trillion dollar online advertisement industry. Even with numerous optimizations and approximations, collaborative filtering (CF) based approaches require real-time computations involving very large vectors. Curating and storing such related profile information vectors on web portals seriously breaches the user's privacy. Modifying s… ▽ More Recommendation systems form the center piece of a rapidly growing trillion dollar online advertisement industry. Even with numerous optimizations and approximations, collaborative filtering (CF) based approaches require real-time computations involving very large vectors. Curating and storing such related profile information vectors on web portals seriously breaches the user's privacy. Modifying such systems to achieve private recommendations further requires communication of long encrypted vectors, making the whole process inefficient. We present a more efficient recommendation system alternative, in which user profiles are maintained entirely on their device, and appropriate recommendations are fetched from web portals in an efficient privacy preserving manner. We base this approach on association rules. △ Less

Submitted 17 June, 2018; v1 submitted 9 October, 2017; originally announced October 2017.

Comments: A preliminary version was presented at the 11th INFORMS Workshop on Data Mining and Decision Analytics (2016)

arXiv:1708.05510 [pdf, other]

Optimizing Revenue over Data-driven Assortments

Authors: Deeksha Sinha, Theja Tulabandhula

Abstract: We revisit the problem of large-scale assortment optimization under the multinomial logit choice model without any assumptions on the structure of the feasible assortments. Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic ass… ▽ More We revisit the problem of large-scale assortment optimization under the multinomial logit choice model without any assumptions on the structure of the feasible assortments. Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic assortment choices to be made, not imposing any constraints on the collection of feasible assortments gives more flexibility to incorporate insights of store-managers and historically well-performing assortments. We design fast and flexible algorithms based on variations of binary search that find the revenue of the (approximately) optimal assortment. We speed up the comparisons steps using novel vector space embeddings, based on advances in the information retrieval literature. For an arbitrary collection of assortments, our algorithms can find a solution in time that is sub-linear in the number of assortments and for the simpler case of cardinality constraints - linear in the number of items (existing methods are quadratic or worse). Empirical validations using the Billion Prices dataset and several retail transaction datasets show that our algorithms are competitive even when the number of items is $\sim 10^5$ ($100$x larger instances than previously studied). △ Less

Submitted 1 May, 2018; v1 submitted 18 August, 2017; originally announced August 2017.

Comments: 28 pages, 4 figures

arXiv:1706.02999 [pdf, other]

Symmetry Learning for Function Approximation in Reinforcement Learning

Authors: Anuj Mahajan, Theja Tulabandhula

Abstract: In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove… ▽ More In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for functional approximation. Finally we show that the use of potential based reward sha** is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information. △ Less

Submitted 9 June, 2017; originally announced June 2017.

Comments: 12 pages, 3 figures. A preliminary version appears in AAMAS 2017. Also presented at the 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making

arXiv:1706.02682 [pdf, other]

Impact of Detour-Aware Policies on Maximizing Profit in Ridesharing

Authors: Arpita Biswas, Ragavendran Gopalakrishnan, Theja Tulabandhula, Asmita Metrewar, Koyel Mukherjee, Raja Subramaniam Thangaraj

Abstract: This paper provides efficient solutions to maximize profit for commercial ridesharing services, under a pricing model with detour-based discounts for passengers. We propose greedy heuristics for real-time ride matching that offer different trade-offs between optimality and speed. Simulations on New York City (NYC) taxi trip data show that our heuristics are up to 90% optimal and 10^5 times faster… ▽ More This paper provides efficient solutions to maximize profit for commercial ridesharing services, under a pricing model with detour-based discounts for passengers. We propose greedy heuristics for real-time ride matching that offer different trade-offs between optimality and speed. Simulations on New York City (NYC) taxi trip data show that our heuristics are up to 90% optimal and 10^5 times faster than the (necessarily) exponential-time optimal algorithm. Commercial ridesharing service providers generate significant savings by matching multiple ride requests using heuristic methods. The resulting savings are typically shared between the service provider (in the form of increased profit) and the ridesharing passengers (in the form of discounts). It is not clear a priori how this split should be effected, since higher discounts would encourage more ridesharing, thereby increasing total savings, but the fraction of savings taken as profit is reduced. We simulate a scenario where the decisions of the passengers to opt for ridesharing depend on the discount offered by the service provider. We provide an adaptive learning algorithm IDFLA that learns the optimal profit-maximizing discount factor for the provider. An evaluation over NYC data shows that IDFLA, on average, learns the optimal discount factor in under 16 iterations. Finally, we investigate the impact of imposing a detour-aware routing policy based on sequential individual rationality, a recently proposed concept. Such restricted policies offer a better ride experience, increasing the provider's market share, but at the cost of decreased average per-ride profit due to the reduced number of matched rides. We construct a model that captures these opposing effects, wherein simulations based on NYC data show that a 7% increase in market share would suffice to offset the decreased average per-ride profit. △ Less

Submitted 8 June, 2017; originally announced June 2017.

Comments: 18 pages, 10 figures

arXiv:1706.02237 [pdf, other]

Efficient Reinforcement Learning via Initial Pure Exploration

Authors: Sudeep Raja Putta, Theja Tulabandhula

Abstract: In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her s… ▽ More In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her scores in the actual tests. We treat this scenario in the context of an agent exploring a fixed-horizon episodic Markov Decision Process (MDP), where the agent can practice on the MDP for some number of episodes (not necessarily known in advance) before starting to incur regret for its actions. During practice, the agent's goal must be to maximize the probability of following an optimal policy. This is akin to the problem of Pure Exploration (PE). We extend the PE problem of Multi Armed Bandits (MAB) to MDPs and propose a Bayesian algorithm called Posterior Sampling for Pure Exploration (PSPE), which is similar to its bandit counterpart. We show that the Bayesian simple regret converges at an optimal exponential rate when using PSPE. When the agent starts being evaluated, its goal would be to minimize the cumulative regret incurred. This is akin to the problem of Reinforcement Learning (RL). The agent uses the Posterior Sampling for Reinforcement Learning algorithm (PSRL) initialized with the posteriors of the practice phase. We hypothesize that this PSPE + PSRL combination is an optimal strategy for minimizing regret in RL problems with an initial practice phase. We show empirical results which prove that having a lower simple regret at the end of the practice phase results in having lower cumulative regret during evaluation. △ Less

Submitted 7 June, 2017; originally announced June 2017.

Comments: 4 pages, 3 figures, Presented at Reinforcement Learning and Decision Making 2017

arXiv:1704.00367 [pdf, other]

Provable Inductive Robust PCA via Iterative Hard Thresholding

Authors: U. N. Niranjan, Arun Rajkumar, Theja Tulabandhula

Abstract: The robust PCA problem, wherein, given an input data matrix that is the superposition of a low-rank matrix and a sparse matrix, we aim to separate out the low-rank and sparse components, is a well-studied problem in machine learning. One natural question that arises is that, as in the inductive setting, if features are provided as input as well, can we hope to do better? Answering this in the affi… ▽ More The robust PCA problem, wherein, given an input data matrix that is the superposition of a low-rank matrix and a sparse matrix, we aim to separate out the low-rank and sparse components, is a well-studied problem in machine learning. One natural question that arises is that, as in the inductive setting, if features are provided as input as well, can we hope to do better? Answering this in the affirmative, the main goal of this paper is to study the robust PCA problem while incorporating feature information. In contrast to previous works in which recovery guarantees are based on the convex relaxation of the problem, we propose a simple iterative algorithm based on hard-thresholding of appropriate residuals. Under weaker assumptions than previous works, we prove the global convergence of our iterative procedure; moreover, it admits a much faster convergence rate and lesser computational complexity per iteration. In practice, through systematic synthetic and real data simulations, we confirm our theoretical findings regarding improvements obtained by using feature information. △ Less

Submitted 4 July, 2017; v1 submitted 2 April, 2017; originally announced April 2017.

arXiv:1703.07853 [pdf, other]

Faster Reinforcement Learning Using Active Simulators

Authors: Vikas Jain, Theja Tulabandhula

Abstract: In this work, we propose several online methods to build a \emph{learning curriculum} from a given set of target-task-specific training tasks in order to speed up reinforcement learning (RL). These methods can decrease the total training time needed by an RL agent compared to training on the target task from scratch. Unlike traditional transfer learning, we consider creating a sequence from severa… ▽ More In this work, we propose several online methods to build a \emph{learning curriculum} from a given set of target-task-specific training tasks in order to speed up reinforcement learning (RL). These methods can decrease the total training time needed by an RL agent compared to training on the target task from scratch. Unlike traditional transfer learning, we consider creating a sequence from several training tasks in order to provide the most benefit in terms of reducing the total time to train. Our methods utilize the learning trajectory of the agent on the curriculum tasks seen so far to decide which tasks to train on next. An attractive feature of our methods is that they are weakly coupled to the choice of the RL algorithm as well as the transfer learning method. Further, when there is domain information available, our methods can incorporate such knowledge to further speed up the learning. We experimentally show that these methods can be used to obtain suitable learning curricula that speed up the overall training time on two different domains. △ Less

Submitted 21 November, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

Comments: 12 pages and 4 figures More experiments added to the previous version

arXiv:1703.07807 [pdf, other]

Learning to Partition using Score Based Compatibilities

Authors: Arun Rajkumar, Koyel Mukherjee, Theja Tulabandhula

Abstract: We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal grou**s. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal grou**s. When we do not impose any structure on the compatibilities, we show that the group fo… ▽ More We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal grou**s. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal grou**s. When we do not impose any structure on the compatibilities, we show that the group formation objectives considered are $NP$ hard to solve and we either give approximation guarantees or prove inapproximability results. We then introduce an elegant structure, namely that of \textit{intrinsic scores}, that makes many of these problems polynomial time solvable. We explicitly characterize the optimal grou**s under this structure and show that the optimal solutions are related to \emph{homophilous} and \emph{heterophilous} partitions, well-studied in the psychology literature. For one of the four objectives, we show $NP$ hardness under the score structure and give a $\frac{1}{2}$ approximation algorithm for which no constant approximation was known thus far. Finally, under the score structure, we propose an online low sample complexity PAC algorithm for learning the optimal partition. We demonstrate the efficacy of the proposed algorithm on synthetic and real world datasets. △ Less

Submitted 22 March, 2017; originally announced March 2017.

Comments: Appears in the Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017)

arXiv:1609.05536 [pdf, other]

Learning Personalized Optimal Control for Repeatedly Operated Systems

Authors: Theja Tulabandhula

Abstract: We consider the problem of online learning of optimal control for repeatedly operated systems in the presence of parametric uncertainty. During each round of operation, environment selects system parameters according to a fixed but unknown probability distribution. These parameters govern the dynamics of a plant. An agent chooses a control input to the plant and is then revealed the cost of the ch… ▽ More We consider the problem of online learning of optimal control for repeatedly operated systems in the presence of parametric uncertainty. During each round of operation, environment selects system parameters according to a fixed but unknown probability distribution. These parameters govern the dynamics of a plant. An agent chooses a control input to the plant and is then revealed the cost of the choice. In this setting, we design an agent that personalizes the control input to this plant taking into account the stochasticity involved. We demonstrate the effectiveness of our approach on a simulated system. △ Less

Submitted 18 September, 2016; originally announced September 2016.

Comments: This work was presented at the NIPS 2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization (ref. https://sites.google.com/site/mlaihci)

arXiv:1608.04929 [pdf, other]

Reinforcement Learning algorithms for regret minimization in structured Markov Decision Processes

Authors: K J Prabuchandran, Tejas Bodas, Theja Tulabandhula

Abstract: A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation research and optimal control, the optimal policy of the underlying Markov Decision Process (MDP) is characterized by a known structure. The current state of the art a… ▽ More A recent goal in the Reinforcement Learning (RL) framework is to choose a sequence of actions or a policy to maximize the reward collected or minimize the regret incurred in a finite time horizon. For several RL problems in operation research and optimal control, the optimal policy of the underlying Markov Decision Process (MDP) is characterized by a known structure. The current state of the art algorithms do not utilize this known structure of the optimal policy while minimizing regret. In this work, we develop new RL algorithms that exploit the structure of the optimal policy to minimize regret. Numerical experiments on MDPs with structured optimal policies show that our algorithms have better performance, are easy to implement, have a smaller run-time and require less number of random number generations. △ Less

Submitted 17 August, 2016; originally announced August 2016.

Comments: An extended abstract appears in AAMAS 2016

arXiv:1607.07306 [pdf, other]

The Costs and Benefits of Sharing: Sequential Individual Rationality and Sequential Fairness

Authors: Ragavendran Gopalakrishnan, Koyel Mukherjee, Theja Tulabandhula

Abstract: In designing dynamic shared service systems that incentivize customers to opt for shared rather than exclusive service, the traditional notion of individual rationality may be insufficient, as a customer's estimated utility could fluctuate arbitrarily during their time in the shared system, as long as their realized utility at service completion is not worse than that for exclusive service. In thi… ▽ More In designing dynamic shared service systems that incentivize customers to opt for shared rather than exclusive service, the traditional notion of individual rationality may be insufficient, as a customer's estimated utility could fluctuate arbitrarily during their time in the shared system, as long as their realized utility at service completion is not worse than that for exclusive service. In this work, within a model that explicitly considers the "inconvenience costs" incurred by customers due to sharing, we introduce the notion of sequential individual rationality (SIR) that requires that the disutility of existing customers is nonincreasing as the system state changes due to new customer arrivals. Next, under SIR, we observe that cost sharing can also be viewed as benefit sharing, which inspires a natural definition of sequential fairness (SF) - the total incremental benefit due to a new customer is shared among existing customers in proportion to the incremental inconvenience suffered. We demonstrate the effectiveness of these notions by applying them to a ridesharing system, where unexpected detours to pick up subsequent passengers inconvenience the existing passengers. Imposing SIR and SF reveals interesting and surprising results, including: (a) natural limits on the incremental detours permissible, (b) exact characterization of "SIR-feasible" routes, which boast sublinear upper and lower bounds on the fractional detours, (c) exact characterization of sequentially fair cost sharing schemes, which includes a strong requirement that passengers must compensate each other for the detour inconveniences that they cause, and (d) new algorithmic problems related to and motivated by SIR. △ Less

Submitted 20 June, 2017; v1 submitted 25 July, 2016; originally announced July 2016.

Comments: Presented as a poster at EC 2016. Presented as an invited talk (sponsored session) at INFORMS Annual Meeting 2016. Presented at MSOM Service Operations SIG 2017. Currently under review at Management Science

arXiv:1407.1097 [pdf, other]

Robust Optimization using Machine Learning for Uncertainty Sets

Authors: Theja Tulabandhula, Cynthia Rudin

Abstract: Our goal is to build robust optimization problems for making decisions based on complex data from the past. In robust optimization (RO) generally, the goal is to create a policy for decision-making that is robust to our uncertainty about the future. In particular, we want our policy to best handle the the worst possible situation that could arise, out of an uncertainty set of possible situations.… ▽ More Our goal is to build robust optimization problems for making decisions based on complex data from the past. In robust optimization (RO) generally, the goal is to create a policy for decision-making that is robust to our uncertainty about the future. In particular, we want our policy to best handle the the worst possible situation that could arise, out of an uncertainty set of possible situations. Classically, the uncertainty set is simply chosen by the user, or it might be estimated in overly simplistic ways with strong assumptions; whereas in this work, we learn the uncertainty set from data collected in the past. The past data are drawn randomly from an (unknown) possibly complicated high-dimensional distribution. We propose a new uncertainty set design and show how tools from statistical learning theory can be employed to provide probabilistic guarantees on the robustness of the policy. △ Less

Submitted 3 July, 2014; originally announced July 2014.

Comments: 28 pages, 2 figures; a shorter preliminary version appeared in ISAIM 2014

arXiv:1405.7764 [pdf, other]

Generalization Bounds for Learning with Linear, Polygonal, Quadratic and Conic Side Knowledge

Authors: Theja Tulabandhula, Cynthia Rudin

Abstract: In this paper, we consider a supervised learning setting where side knowledge is provided about the labels of unlabeled examples. The side knowledge has the effect of reducing the hypothesis space, leading to tighter generalization bounds, and thus possibly better generalization. We consider several types of side knowledge, the first leading to linear and polygonal constraints on the hypothesis sp… ▽ More In this paper, we consider a supervised learning setting where side knowledge is provided about the labels of unlabeled examples. The side knowledge has the effect of reducing the hypothesis space, leading to tighter generalization bounds, and thus possibly better generalization. We consider several types of side knowledge, the first leading to linear and polygonal constraints on the hypothesis space, the second leading to quadratic constraints, and the last leading to conic constraints. We show how different types of domain knowledge can lead directly to these kinds of side knowledge. We prove bounds on complexity measures of the hypothesis space for quadratic and conic side knowledge, and show that these bounds are tight in a specific sense for the quadratic case. △ Less

Submitted 7 October, 2014; v1 submitted 29 May, 2014; originally announced May 2014.

Comments: 37 pages, 3 figures, a shorter version appeared in ISAIM 2014 (new additions include a reference change and a new figure)

arXiv:1112.0698 [pdf, other]

Machine Learning with Operational Costs

Authors: Theja Tulabandhula, Cynthia Rudin

Abstract: This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models,… ▽ More This work proposes a way to align statistical modeling with decision making. We provide a method that propagates the uncertainty in predictive modeling to the uncertainty in operational cost, where operational cost is the amount spent by the practitioner in solving the problem. The method allows us to explore the range of operational costs associated with the set of reasonable statistical models, so as to provide a useful way for practitioners to understand uncertainty. To do this, the operational cost is cast as a regularization term in a learning algorithm's objective function, allowing either an optimistic or pessimistic view of possible costs, depending on the regularization parameter. From another perspective, if we have prior knowledge about the operational cost, for instance that it should be low, this knowledge can help to restrict the hypothesis space, and can help with generalization. We provide a theoretical generalization bound for this scenario. We also show that learning with operational costs is related to robust optimization. △ Less

Submitted 18 June, 2013; v1 submitted 3 December, 2011; originally announced December 2011.

Comments: Current version: Final version appearing in JMLR 2013. v2: Many parts have been rewritten including the introduction, Minor correction of Theorem 6. 38 pages. Previously: v1: 36 pages, 8 figures. Short version appears in Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2012

arXiv:1104.5061 [pdf, other]

On Combining Machine Learning with Decision Making

Authors: Theja Tulabandhula, Cynthia Rudin

Abstract: We present a new application and covering number bound for the framework of "Machine Learning with Operational Costs (MLOC)," which is an exploratory form of decision theory. The MLOC framework incorporates knowledge about how a predictive model will be used for a subsequent task, thus combining machine learning with the decision that is made afterwards. In this work, we use the MLOC framework to… ▽ More We present a new application and covering number bound for the framework of "Machine Learning with Operational Costs (MLOC)," which is an exploratory form of decision theory. The MLOC framework incorporates knowledge about how a predictive model will be used for a subsequent task, thus combining machine learning with the decision that is made afterwards. In this work, we use the MLOC framework to study a problem that has implications for power grid reliability and maintenance, called the Machine Learning and Traveling Repairman Problem ML&TRP. The goal of the ML&TRP is to determine a route for a "repair crew," which repairs nodes on a graph. The repair crew aims to minimize the cost of failures at the nodes, but as in many real situations, the failure probabilities are not known and must be estimated. The MLOC framework allows us to understand how this uncertainty influences the repair route. We also present new covering number generalization bounds for the MLOC framework. △ Less

Submitted 12 March, 2014; v1 submitted 26 April, 2011; originally announced April 2011.

Comments: 35 pages, 16 figures, longer version of a paper appearing in Algorithmic Decision Theory 2011

Showing 1–47 of 47 results for author: Tulabandhula, T