-
CyNetDiff -- A Python Library for Accelerated Implementation of Network Diffusion Models
Authors:
Eliot W. Robson,
Dhemath Reddy,
Abhishek K. Umrawal
Abstract:
In recent years, there has been increasing interest in network diffusion models and related problems. The most popular of these are the independent cascade and linear threshold models. Much of the recent experimental work done on these models requires a large number of simulations conducted on large graphs, a computationally expensive task suited for low-level languages. However, many researchers…
▽ More
In recent years, there has been increasing interest in network diffusion models and related problems. The most popular of these are the independent cascade and linear threshold models. Much of the recent experimental work done on these models requires a large number of simulations conducted on large graphs, a computationally expensive task suited for low-level languages. However, many researchers prefer the use of higher-level languages (such as Python) for their flexibility and shorter development times. Moreover, in many research tasks, these simulations are the most computationally intensive task, so it would be desirable to have a library for these with an interface to a high-level language with the performance of a low-level language. To fill this niche, we introduce CyNetDiff, a Python library with components written in Cython to provide improved performance for these computationally intensive diffusion tasks.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models
Authors:
Nancy Tyagi,
Aidin Shiri,
Surjodeep Sarkar,
Abhishek Kumar Umrawal,
Manas Gaur
Abstract:
Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are develo** larger FLMs (e.g., XLNet, T5) to enable contextualized language representation, classification, and generation. While develo** larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty. Fundamentally, larg…
▽ More
Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are develo** larger FLMs (e.g., XLNet, T5) to enable contextualized language representation, classification, and generation. While develo** larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty. Fundamentally, larger FLMs are built on the same foundations as smaller FLMs (e.g., BERT); hence, one must recognize the potential of smaller FLMs which can be realized through an ensemble. In the current research, we perform a reality check on FLMs and their ensemble on benchmark and real-world datasets. We hypothesize that the ensembling of FLMs can influence the individualistic attention of FLMs and unravel the strength of coordination and cooperation of different FLMs. We utilize BERT and define three other ensemble techniques: {Shallow, Semi, and Deep}, wherein the Deep-Ensemble introduces a knowledge-guided reinforcement learning approach. We discovered that the suggested Deep-Ensemble BERT outperforms its large variation i.e. BERTlarge, by a factor of many times using datasets that show the usefulness of NLP in sensitive fields, such as mental health.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
A Genetic Algorithm-based Framework for Learning Statistical Power Manifold
Authors:
Abhishek K. Umrawal,
Sean P. Lane,
Erin P. Hennes
Abstract:
Statistical power is a measure of the replicability of a categorical hypothesis test. Formally, it is the probability of detecting an effect, if there is a true effect present in the population. Hence, optimizing statistical power as a function of some parameters of a hypothesis test is desirable. However, for most hypothesis tests, the explicit functional form of statistical power for individual…
▽ More
Statistical power is a measure of the replicability of a categorical hypothesis test. Formally, it is the probability of detecting an effect, if there is a true effect present in the population. Hence, optimizing statistical power as a function of some parameters of a hypothesis test is desirable. However, for most hypothesis tests, the explicit functional form of statistical power for individual model parameters is unknown; but calculating power for a given set of values of those parameters is possible using simulated experiments. These simulated experiments are usually computationally expensive. Hence, develo** the entire statistical power manifold using simulations can be very time-consuming. We propose a novel genetic algorithm-based framework for learning statistical power manifolds. For a multiple linear regression $F$-test, we show that the proposed algorithm/framework learns the statistical power manifold much faster as compared to a brute-force approach as the number of queries to the power oracle is significantly reduced. We also show that the quality of learning the manifold improves as the number of iterations increases for the genetic algorithm. Such tools are useful for evaluating statistical power trade-offs when researchers have little information regarding a priori best guesses of primary effect sizes of interest or how sampling variability in non-primary effects impacts power for primary ones.
△ Less
Submitted 18 February, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
A Community-Aware Framework for Social Influence Maximization
Authors:
Abhishek K. Umrawal,
Christopher J. Quinn,
Vaneet Aggarwal
Abstract:
We consider the problem of Influence Maximization (IM), the task of selecting $k$ seed nodes in a social network such that the expected number of nodes influenced is maximized. We propose a community-aware divide-and-conquer framework that involves (i) learning the inherent community structure of the social network, (ii) generating candidate solutions by solving the influence maximization problem…
▽ More
We consider the problem of Influence Maximization (IM), the task of selecting $k$ seed nodes in a social network such that the expected number of nodes influenced is maximized. We propose a community-aware divide-and-conquer framework that involves (i) learning the inherent community structure of the social network, (ii) generating candidate solutions by solving the influence maximization problem for each community, and (iii) selecting the final set of seed nodes using a novel progressive budgeting scheme. Our experiments on real-world social networks show that the proposed framework outperforms the standard methods in terms of run-time and the heuristic methods in terms of influence. We also study the effect of the community structure on the performance of the proposed framework. Our experiments show that the community structures with higher modularity lead the proposed framework to perform better in terms of run-time and influence.
△ Less
Submitted 18 February, 2023; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Leveraging Causal Graphs for Blocking in Randomized Experiments
Authors:
Abhishek Kumar Umrawal
Abstract:
Randomized experiments are often performed to study the causal effects of interest. Blocking is a technique to precisely estimate the causal effects when the experimental material is not homogeneous. It involves stratifying the available experimental material based on the covariates causing non-homogeneity and then randomizing the treatment within those strata (known as blocks). This eliminates th…
▽ More
Randomized experiments are often performed to study the causal effects of interest. Blocking is a technique to precisely estimate the causal effects when the experimental material is not homogeneous. It involves stratifying the available experimental material based on the covariates causing non-homogeneity and then randomizing the treatment within those strata (known as blocks). This eliminates the unwanted effect of the covariates on the causal effects of interest. We investigate the problem of finding a stable set of covariates to be used to form blocks, that minimizes the variance of the causal effect estimates. Using the underlying causal graph, we provide an efficient algorithm to obtain such a set for a general semi-Markovian causal model.
△ Less
Submitted 18 February, 2023; v1 submitted 3 November, 2021;
originally announced November 2021.
-
On Parameter Estimation in Unobserved Components Models subject to Linear Inequality Constraints
Authors:
Abhishek K. Umrawal,
Joshua C. C. Chan
Abstract:
We propose a new \textit{quadratic programming-based} method of approximating a nonstandard density using a multivariate Gaussian density. Such nonstandard densities usually arise while develo** posterior samplers for unobserved components models involving inequality constraints on the parameters. For instance, Chan et al. (2016) provided a new model of trend inflation with linear inequality con…
▽ More
We propose a new \textit{quadratic programming-based} method of approximating a nonstandard density using a multivariate Gaussian density. Such nonstandard densities usually arise while develo** posterior samplers for unobserved components models involving inequality constraints on the parameters. For instance, Chan et al. (2016) provided a new model of trend inflation with linear inequality constraints on the stochastic trend. We implemented the proposed quadratic programming-based method for this model and compared it to the existing approximation. We observed that the proposed method works as well as the existing approximation in terms of the final trend estimates while achieving gains in terms of sample efficiency.
△ Less
Submitted 12 February, 2023; v1 submitted 23 October, 2021;
originally announced October 2021.
-
DeepFreight: Integrating Deep Reinforcement Learning and Mixed Integer Programming for Multi-transfer Truck Freight Delivery
Authors:
Jiayu Chen,
Abhishek K. Umrawal,
Tian Lan,
Vaneet Aggarwal
Abstract:
With the freight delivery demands and ship** costs increasing rapidly, intelligent control of fleets to enable efficient and cost-conscious solutions becomes an important problem. In this paper, we propose DeepFreight, a model-free deep-reinforcement-learning-based algorithm for multi-transfer freight delivery, which includes two closely-collaborative components: truck-dispatch and package-match…
▽ More
With the freight delivery demands and ship** costs increasing rapidly, intelligent control of fleets to enable efficient and cost-conscious solutions becomes an important problem. In this paper, we propose DeepFreight, a model-free deep-reinforcement-learning-based algorithm for multi-transfer freight delivery, which includes two closely-collaborative components: truck-dispatch and package-matching. Specifically, a deep multi-agent reinforcement learning framework called QMIX is leveraged to learn a dispatch policy, with which we can obtain the multi-step joint vehicle dispatch decisions for the fleet with respect to the delivery requests. Then an efficient multi-transfer matching algorithm is executed to assign the delivery requests to the trucks. Also, DeepFreight is integrated with a Mixed-Integer Linear Programming optimizer for further optimization. The evaluation results show that the proposed system is highly scalable and ensures a 100\% delivery success while maintaining low delivery-time and fuel consumption. The codes are available at https://github.com/LucasCJYSDL/DeepFreight.
△ Less
Submitted 25 May, 2023; v1 submitted 4 March, 2021;
originally announced March 2021.
-
FlexPool: A Distributed Model-Free Deep Reinforcement Learning Algorithm for Joint Passengers & Goods Transportation
Authors:
Kaushik Manchella,
Abhishek K. Umrawal,
Vaneet Aggarwal
Abstract:
The growth in online goods delivery is causing a dramatic surge in urban vehicle traffic from last-mile deliveries. On the other hand, ride-sharing has been on the rise with the success of ride-sharing platforms and increased research on using autonomous vehicle technologies for routing and matching. The future of urban mobility for passengers and goods relies on leveraging new methods that minimi…
▽ More
The growth in online goods delivery is causing a dramatic surge in urban vehicle traffic from last-mile deliveries. On the other hand, ride-sharing has been on the rise with the success of ride-sharing platforms and increased research on using autonomous vehicle technologies for routing and matching. The future of urban mobility for passengers and goods relies on leveraging new methods that minimize operational costs and environmental footprints of transportation systems.
This paper considers combining passenger transportation with goods delivery to improve vehicle-based transportation. Even though the problem has been studied with a defined dynamics model of the transportation system environment, this paper considers a model-free approach that has been demonstrated to be adaptable to new or erratic environment dynamics. We propose FlexPool, a distributed model-free deep reinforcement learning algorithm that jointly serves passengers & goods workloads by learning optimal dispatch policies from its interaction with the environment. The proposed algorithm pools passengers for a ride-sharing service and delivers goods using a multi-hop transit method. These flexibilities decrease the fleet's operational cost and environmental footprint while maintaining service levels for passengers and goods. Through simulations on a realistic multi-agent urban mobility platform, we demonstrate that FlexPool outperforms other model-free settings in serving the demands from passengers & goods. FlexPool achieves 30% higher fleet utilization and 35% higher fuel efficiency in comparison to (i) model-free approaches where vehicles transport a combination of passengers & goods without the use of multi-hop transit, and (ii) model-free approaches where vehicles exclusively transport either passengers or goods.
△ Less
Submitted 20 September, 2020; v1 submitted 27 July, 2020;
originally announced July 2020.
-
Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback
Authors:
Mridul Agarwal,
Vaneet Aggarwal,
Christopher J. Quinn,
Abhishek K. Umrawal
Abstract:
Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback re…
▽ More
Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$. %When applied to the problem of Social Influence Maximization, the performance of the proposed algorithm surpasses the UCB algorithm and some more sophisticated domain-specific methods.
△ Less
Submitted 11 October, 2021; v1 submitted 28 November, 2018;
originally announced November 2018.