Search | arXiv e-print repository

How to Rent GPUs on a Budget

Authors: Zhouzi Li, Benjamin Berg, Arpan Mukhopadhyay, Mor Harchol-Balter

Abstract: The explosion in Machine Learning (ML) over the past ten years has led to a dramatic increase in demand for GPUs to train ML models. Because it is prohibitively expensive for most users to build and maintain a large GPU cluster, large cloud providers (Microsoft Azure, Amazon AWS, Google Cloud) have seen explosive growth in demand for renting cloud-based GPUs. In this cloud-computing paradigm, a us… ▽ More The explosion in Machine Learning (ML) over the past ten years has led to a dramatic increase in demand for GPUs to train ML models. Because it is prohibitively expensive for most users to build and maintain a large GPU cluster, large cloud providers (Microsoft Azure, Amazon AWS, Google Cloud) have seen explosive growth in demand for renting cloud-based GPUs. In this cloud-computing paradigm, a user must specify their demand for GPUs at every moment in time, and will pay for every GPU-hour they use. ML training jobs are known to be parallelizable to different degrees. Given a stream of ML training jobs, a user typically wants to minimize the mean response time across all jobs. Here, the response time of a job denotes the time from when a job arrives until it is complete. Additionally, the user is constrained by some operating budget. Specifically, in this paper the user is constrained to use no more than $b$ GPUs per hour, over a long-run time average. The question is how to minimize mean response time while meeting the budget constraint. Because training jobs receive a diminishing marginal benefit from running on additional GPUs, allocating too many GPUs to a single training job can dramatically increase the overall cost paid by the user. Hence, an optimal rental policy must balance a tradeoff between training cost and mean response time. This paper derives the optimal rental policy for a stream of training jobs where the jobs have different levels of parallelizability (specified by a speedup function) and different job sizes (amounts of inherent work). We make almost no assumptions about the arrival process and about the job size distribution. Our optimal policy specifies how many GPUs to rent at every moment in time and how to allocate these GPUs. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.09427 [pdf, other]

On Optimal Server Allocation for Moldable Jobs with Concave Speed-Up

Authors: Samira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin

Abstract: A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers. Although allocating more servers to such a job results in a higher speed-up in the job's execution, it reduces the number of servers available to other jobs, which in the worst case, can result in an incoming job not findin… ▽ More A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers. Although allocating more servers to such a job results in a higher speed-up in the job's execution, it reduces the number of servers available to other jobs, which in the worst case, can result in an incoming job not finding any available server to run immediately upon arrival. Hence, a key question to address is: how to optimally allocate servers to jobs such that (i) the average execution time across jobs is minimized and (ii) almost all jobs find at least one server immediately upon arrival. To address this question, we consider a system with $n$ servers, where jobs are parallelizable up to $d^{(n)}$ servers and the speed-up function of jobs is concave and increasing. Jobs not finding any available servers upon entry are blocked and lost. We propose a simple server allocation scheme that achieves the minimum average execution time of accepted jobs while ensuring that the blocking probability of jobs vanishes as the system becomes large ($n \to \infty$). This result is established for various traffic conditions as well as for heterogeneous workloads. To prove our result, we employ Stein's method which also yields non-asymptotic bounds on the blocking probability and the mean execution time. Furthermore, our simulations show that the performance of the scheme is insensitive to the distribution of job execution times. △ Less

Submitted 15 April, 2024; originally announced June 2024.

MSC Class: 60J28 (Primary) 60K25; 68M20 (Secondary)

arXiv:2405.13205 [pdf, other]

Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing

Authors: Amutheezan Sivagnanam, Ava Pettet, Hunter Lee, Ayan Mukhopadhyay, Abhishek Dubey, Aron Laszka

Abstract: An emergency responder management (ERM) system dispatches responders, such as ambulances, when it receives requests for medical aid. ERM systems can also proactively reposition responders between predesignated waiting locations to cover any gaps that arise due to the prior dispatch of responders or significant changes in the distribution of anticipated requests. Optimal repositioning is computatio… ▽ More An emergency responder management (ERM) system dispatches responders, such as ambulances, when it receives requests for medical aid. ERM systems can also proactively reposition responders between predesignated waiting locations to cover any gaps that arise due to the prior dispatch of responders or significant changes in the distribution of anticipated requests. Optimal repositioning is computationally challenging due to the exponential number of ways to allocate responders between locations and the uncertainty in future requests. The state-of-the-art approach in proactive repositioning is a hierarchical approach based on spatial decomposition and online Monte Carlo tree search, which may require minutes of computation for each decision in a domain where seconds can save lives. We address the issue of long decision times by introducing a novel reinforcement learning (RL) approach, based on the same hierarchical decomposition, but replacing online search with learning. To address the computational challenges posed by large, variable-dimensional, and discrete state and action spaces, we propose: (1) actor-critic based agents that incorporate transformers to handle variable-dimensional states and actions, (2) projections to fixed-dimensional observations to handle complex states, and (3) combinatorial techniques to map continuous actions to discrete allocations. We evaluate our approach using real-world data from two U.S. cities, Nashville, TN and Seattle, WA. Our experiments show that compared to the state of the art, our approach reduces computation time per decision by three orders of magnitude, while also slightly reducing average ambulance response time by 5 seconds. △ Less

Submitted 8 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.19203 [pdf]

doi 10.1109/ITherm55368.2023.10177601

Thermal Performance of a Liquid-cooling Assisted Thin Wickless Vapor Chamber

Authors: Arani Mukhopadhyay, Anish Pal, Mohamad Jafari Gukeh, Constantine M. Megaridis

Abstract: The ever-increasing need for power consumption in electronic devices, coupled with the requirement for thinner size, calls for the development of efficient heat spreading components. Vapor chambers (VCs), because of their ability to effectively spread heat over a large area by two-phase heat transfer, seem ideal for such applications. However, creating thin and efficient vapor chambers that work o… ▽ More The ever-increasing need for power consumption in electronic devices, coupled with the requirement for thinner size, calls for the development of efficient heat spreading components. Vapor chambers (VCs), because of their ability to effectively spread heat over a large area by two-phase heat transfer, seem ideal for such applications. However, creating thin and efficient vapor chambers that work over a wide range of power inputs is a persisting challenge. VCs that use wicks for circulating the phase changing media, suffer from capillary restrictions, dry-out, clogging, increase in size and weight, and can often be costly. Recent developments in wick-free wettability patterned vapor chambers replace traditional wicks with laser-fabricated wickless components. An experimental setup allows for fast testing and experimental evaluation of water-charged VCs with liquid-assisted cooling. The sealed chamber can maintain vacuum for long durations, and can be used for testing of very thin wick-free VCs. This work extends our previous study by decreasing overall thickness of the wick-free VC down to 3 mm and evaluates its performance. Furthermore, the impact of wettability patterns on VC performance is investigated, by carrying out experiments both in non-patterned and patterned VCs. Experiments are first carried out on a wick-free VC with no wettability patterns and comprising of an entirely superhydrophilic evaporator coupled with a hydrophobic condenser. Thereafter, wettability patterns that aid the rapid return of water to the heated site on the evaporator and improve condensation on the condenser of the vapor chamber are implemented. The thermal characteristics show that the patterned VCs outperform the non-patterned VCs under all scenarios. The patterned VCs exhibit low thermal resistance independent of fluid charging ratio withstanding higher power inputs without thermal dry-outs. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Presented at IEEE ITherm (Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems) 2023. Orlando, FL, US. Corresponding: [email protected]

arXiv:2404.19195 [pdf]

doi 10.1109/ITherm55368.2023.10177653

Evaluation of Thermal Performance of a Wick-free Vapor Chamber in Power Electronics Cooling

Authors: Arani Mukhopadhyay, Anish Pal, Congbo Bao, Mohamad Jafari Gukeh, Sudip K. Mazumder, Constantine M. Megaridis

Abstract: Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the w… ▽ More Efficient thermal management in high-power electronics cooling can be achieved using phase-change heat transfer devices, such as vapor chambers. Traditional vapor chambers use wicks to transport condensate for efficient thermal exchange and to prevent "dry-out" of the evaporator. However, wicks in vapor chambers present significant design challenges arising out of large pressure drops across the wicking material, which slows down condensate transport rates and increases the chances for dry-out. Thicker wicks add to overall thermal resistance, while deterring the development of thinner devices by limiting the total thickness of the vapor chamber. Wickless vapor chambers eliminate the use of metal wicks entirely, by incorporating complementary wettability-patterned flat plates on both the evaporator and the condenser side. Such surface modifications enhance fluid transport on the evaporator side, while allowing the chambers to be virtually as thin as imaginable, thereby permitting design of thermally efficient thin electronic cooling devices. While wick-free vapor chambers have been studied and efficient design strategies have been suggested, we delve into real-life applications of wick-free vapor chambers in forced air cooling of high-power electronics. An experimental setup is developed wherein two Si-based MOSFETs of TO-247-3 packaging having high conduction resistance, are connected in parallel and switched at 100 kHz, to emulate high frequency power electronics operations. A rectangular copper wick-free vapor chamber spreads heat laterally over a surface 13 times larger than the heating area. This chamber is cooled externally by a fan that circulates air at room temperature. The present experimental setup extends our previous work on wick-free vapor chambers, while demonstrating the effectiveness of low-cost air cooling in vapor-chamber enhanced high-power electronics applications. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Presented at IEEE ITherm (Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems) 2023, Orlando FL. Corresponding author: [email protected]

arXiv:2403.04072 [pdf, other]

Forecasting and Mitigating Disruptions in Public Bus Transit Services

Authors: Chaeeun Han, Jose Paolo Talusan, Dan Freudberg, Ayan Mukhopadhyay, Abhishek Dubey, Aron Laszka

Abstract: Public transportation systems often suffer from unexpected fluctuations in demand and disruptions, such as mechanical failures and medical emergencies. These fluctuations and disruptions lead to delays and overcrowding, which are detrimental to the passengers' experience and to the overall performance of the transit service. To proactively mitigate such events, many transit agencies station substi… ▽ More Public transportation systems often suffer from unexpected fluctuations in demand and disruptions, such as mechanical failures and medical emergencies. These fluctuations and disruptions lead to delays and overcrowding, which are detrimental to the passengers' experience and to the overall performance of the transit service. To proactively mitigate such events, many transit agencies station substitute (reserve) vehicles throughout their service areas, which they can dispatch to augment or replace vehicles on routes that suffer overcrowding or disruption. However, determining the optimal locations where substitute vehicles should be stationed is a challenging problem due to the inherent randomness of disruptions and due to the combinatorial nature of selecting locations across a city. In collaboration with the transit agency of Nashville, TN, we address this problem by introducing data-driven statistical and machine-learning models for forecasting disruptions and an effective randomized local-search algorithm for selecting locations where substitute vehicles are to be stationed. Our research demonstrates promising results in proactive disruption management, offering a practical and easily implementable solution for transit agencies to enhance the reliability of their services. Our results resonate beyond mere operational efficiency: by advancing proactive strategies, our approach fosters more resilient and accessible public transportation, contributing to equitable urban mobility and ultimately benefiting the communities that rely on public transportation the most. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.03339 [pdf, other]

An Online Approach to Solving Public Transit Stationing and Dispatch Problem

Authors: Jose Paolo Talusan, Chaeeun Han, Ayan Mukhopadhyay, Aron Laszka, Dan Freudberg, Abhishek Dubey

Abstract: Public bus transit systems provide critical transportation services for large sections of modern communities. On-time performance and maintaining the reliable quality of service is therefore very important. Unfortunately, disruptions caused by overcrowding, vehicular failures, and road accidents often lead to service performance degradation. Though transit agencies keep a limited number of vehicle… ▽ More Public bus transit systems provide critical transportation services for large sections of modern communities. On-time performance and maintaining the reliable quality of service is therefore very important. Unfortunately, disruptions caused by overcrowding, vehicular failures, and road accidents often lead to service performance degradation. Though transit agencies keep a limited number of vehicles in reserve and dispatch them to relieve the affected routes during disruptions, the procedure is often ad-hoc and has to rely on human experience and intuition to allocate resources (vehicles) to affected trips under uncertainty. In this paper, we describe a principled approach using non-myopic sequential decision procedures to solve the problem and decide (a) if it is advantageous to anticipate problems and proactively station transit buses near areas with high-likelihood of disruptions and (b) decide if and which vehicle to dispatch to a particular problem. Our approach was developed in partnership with the Metropolitan Transportation Authority for a mid-sized city in the USA and models the system as a semi-Markov decision problem (solved as a Monte-Carlo tree search procedure) and shows that it is possible to obtain an answer to these two coupled decision problems in a way that maximizes the overall reward (number of people served). We sample many possible futures from generative models, each is assigned to a tree and processed using root parallelization. We validate our approach using 3 years of data from our partner agency. Our experiments show that the proposed framework serves 2% more passengers while reducing deadhead miles by 40%. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.00017 [pdf, other]

Deploying ADVISER: Impact and Lessons from Using Artificial Intelligence for Child Vaccination Uptake in Nigeria

Authors: Opadele Kehinde, Ruth Abdul, Bose Afolabi, Parminder Vir, Corinne Namblard, Ayan Mukhopadhyay, Abiodun Adereni

Abstract: More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in underdeveloped countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeri… ▽ More More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in underdeveloped countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeria, where the rate of infant mortality is appalling. In particular, low vaccination uptake in Nigeria is a major driver of more than 2,000 daily deaths of children under the age of five years. In this paper, we describe our collaboration with government partners in Nigeria to deploy ADVISER: AI-Driven Vaccination Intervention Optimiser. The framework, based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination, is the first successful deployment of an AI-enabled toolchain for optimizing the allocation of health interventions in Nigeria. In this paper, we provide a background of the ADVISER framework and present results, lessons, and success stories of deploying ADVISER to more than 13,000 families in the state of Oyo, Nigeria. △ Less

Submitted 30 December, 2023; originally announced February 2024.

Comments: Accepted for publication at the AAAI Conference on Artificial Intelligence (AAAI-24)

arXiv:2401.06291 [pdf, other]

Frequency-Time Diffusion with Neural Cellular Automata

Authors: John Kalkhof, Arlene Kühn, Yannik Frisch, Anirban Mukhopadhyay

Abstract: Despite considerable success, large Denoising Diffusion Models (DDMs) with UNet backbone pose practical challenges, particularly on limited hardware and in processing gigapixel images. To address these limitations, we introduce two Neural Cellular Automata (NCA)-based DDMs: Diff-NCA and FourierDiff-NCA. Capitalizing on the local communication capabilities of NCA, Diff-NCA significantly reduces the… ▽ More Despite considerable success, large Denoising Diffusion Models (DDMs) with UNet backbone pose practical challenges, particularly on limited hardware and in processing gigapixel images. To address these limitations, we introduce two Neural Cellular Automata (NCA)-based DDMs: Diff-NCA and FourierDiff-NCA. Capitalizing on the local communication capabilities of NCA, Diff-NCA significantly reduces the parameter counts of NCA-based DDMs. Integrating Fourier-based diffusion enables global communication early in the diffusion process. This feature is particularly valuable in synthesizing complex images with important global features, such as the CelebA dataset. We demonstrate that even a 331k parameter Diff-NCA can generate 512x512 pathology slices, while FourierDiff-NCA (1.1m parameters) reaches a three times lower FID score of 43.86, compared to the four times bigger UNet (3.94m parameters) with a score of 128.2. Additionally, FourierDiff-NCA can perform diverse tasks such as super-resolution, out-of-distribution image synthesis, and inpainting without explicit training. △ Less

Submitted 13 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.03197 [pdf, other]

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Authors: Ava Pettet, Yunuo Zhang, Baiting Luo, Kyle Wray, Hendrik Baier, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay

Abstract: Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action tra… ▽ More Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines. △ Less

Submitted 20 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: Extended Abstract accepted for presentation at AAMAS 2024

arXiv:2401.01841 [pdf, other]

Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes

Authors: Baiting Luo, Yunuo Zhang, Abhishek Dubey, Ayan Mukhopadhyay

Abstract: A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated enviro… ▽ More A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts ``safely'' to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice -- updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree Search (ADA-MCTS)} that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify ``updated knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the agent's updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with the multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and highly adaptive without sacrificing safety. △ Less

Submitted 21 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: Accepted for publication at the International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2024

arXiv:2401.00928 [pdf, other]

OSINT Research Studios: A Flexible Crowdsourcing Framework to Scale Up Open Source Intelligence Investigations

Authors: Anirban Mukhopadhyay, Sukrit Venkatagiri, Kurt Luther

Abstract: Open Source Intelligence (OSINT) investigations, which rely entirely on publicly available data such as social media, play an increasingly important role in solving crimes and holding governments accountable. The growing volume of data and complex nature of tasks, however, means there is a pressing need to scale and speed up OSINT investigations. Expert-led crowdsourcing approaches show promise bu… ▽ More Open Source Intelligence (OSINT) investigations, which rely entirely on publicly available data such as social media, play an increasingly important role in solving crimes and holding governments accountable. The growing volume of data and complex nature of tasks, however, means there is a pressing need to scale and speed up OSINT investigations. Expert-led crowdsourcing approaches show promise but tend to either focus on narrow tasks or domains or require resource-intense, long-term relationships between expert investigators and crowds. We address this gap by providing a flexible framework that enables investigators across domains to enlist crowdsourced support for the discovery and verification of OSINT. We use a design-based research (DBR) approach to develop OSINT Research Studios (ORS), a sociotechnical system in which novice crowds are trained to support professional investigators with complex OSINT investigations. Through our qualitative evaluation, we found that ORS facilitates ethical and effective OSINT investigations across multiple domains. We also discuss broader implications of expert-crowd collaboration and opportunities for future work. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: To be published in CSCW 2024

arXiv:2312.10497 [pdf, other]

Diffusion Approximations of Speed-Aware Join-the-Shortest-Queue Scheme: Transient and Stationary Analysis

Authors: Sanidhay Bhambay, Burak Büke, Arpan Mukhopadhyay

Abstract: The Join-the-Shortest-Queue (JSQ) load balancing scheme is widely acknowledged for its effectiveness in minimizing the average response time for jobs in systems with identical servers. However, when applied to a heterogeneous server system with servers of different processing speeds, the JSQ scheme exhibits suboptimal performance. Recently, a variation of JSQ called the Speed-Aware-Join-the-Shorte… ▽ More The Join-the-Shortest-Queue (JSQ) load balancing scheme is widely acknowledged for its effectiveness in minimizing the average response time for jobs in systems with identical servers. However, when applied to a heterogeneous server system with servers of different processing speeds, the JSQ scheme exhibits suboptimal performance. Recently, a variation of JSQ called the Speed-Aware-Join-the-Shortest-Queue (SA-JSQ) scheme has been shown to attain fluid limit optimality for systems with heterogeneous servers. In this paper, we examine the SA-JSQ scheme for heterogeneous server systems under the Halfin-Whitt regime. Our analysis begins by establishing that the scaled and centered version of the system state weakly converges to a diffusion process characterized by stochastic integral equations. Furthermore, we prove that the diffusion process is positive recurrent and the sequence of stationary measures for the scaled and centered queue length processes converge to the stationary measure for the limiting diffusion process. To achieve this result, we employ Stein's method with a generator expansion approach. △ Less

Submitted 16 December, 2023; originally announced December 2023.

MSC Class: 60K25 (Primary) 60F05; 68M20 (Secondary)

arXiv:2311.00548 [pdf, other]

Continual atlas-based segmentation of prostate MRI

Authors: Amin Ranem, Camila González, Daniel Pinto dos Santos, Andreas M. Bucher, Ahmed E. Othman, Anirban Mukhopadhyay

Abstract: Continual learning (CL) methods designed for natural image classification often fail to reach basic quality standards for medical image segmentation. Atlas-based segmentation, a well-established approach in medical imaging, incorporates domain knowledge on the region of interest, leading to semantically coherent predictions. This is especially promising for CL, as it allows us to leverage structur… ▽ More Continual learning (CL) methods designed for natural image classification often fail to reach basic quality standards for medical image segmentation. Atlas-based segmentation, a well-established approach in medical imaging, incorporates domain knowledge on the region of interest, leading to semantically coherent predictions. This is especially promising for CL, as it allows us to leverage structural information and strike an optimal balance between model rigidity and plasticity over time. When combined with privacy-preserving prototypes, this process offers the advantages of rehearsal-based CL without compromising patient privacy. We propose Atlas Replay, an atlas-based segmentation approach that uses prototypes to generate high-quality segmentation masks through image registration that maintain consistency even as the training distribution changes. We explore how our proposed method performs compared to state-of-the-art CL methods in terms of knowledge transferability across seven publicly available prostate segmentation datasets. Prostate segmentation plays a vital role in diagnosing prostate cancer, however, it poses challenges due to substantial anatomical variations, benign structural differences in older age groups, and fluctuating acquisition parameters. Our results show that Atlas Replay is both robust and generalizes well to yet-unseen domains while being able to maintain knowledge, unlike end-to-end segmentation methods. Our code base is available under https://github.com/MECLabTUDA/Atlas-Replay. △ Less

Submitted 6 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.16695 [pdf, other]

From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models

Authors: Christian Harder, Moritz Fuchs, Yuri Tolkach, Anirban Mukhopadhyay

Abstract: Traditional initialisation methods, e.g. He and Xavier, have been effective in avoiding the problem of vanishing or exploding gradients in neural networks. However, they only use simple pointwise distributions, which model one-dimensional variables. Moreover, they ignore most information about the architecture and disregard past training experiences. These limitations can be overcome by employing… ▽ More Traditional initialisation methods, e.g. He and Xavier, have been effective in avoiding the problem of vanishing or exploding gradients in neural networks. However, they only use simple pointwise distributions, which model one-dimensional variables. Moreover, they ignore most information about the architecture and disregard past training experiences. These limitations can be overcome by employing generative models for initialisation. In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks. We thoroughly evaluate the impact of the employed generative models on state-of-the-art neural networks in terms of accuracy, convergence speed and ensembling. Our results show that global initialisations result in higher accuracy and faster initial convergence speed. However, the implementation through graph hypernetworks leads to diminished ensemble performance on out of distribution data. To counteract, we propose a modification called noise graph hypernetwork, which encourages diversity in the produced ensemble members. Furthermore, our approach might be able to transfer learned knowledge to different image distributions. Our work provides insights into the potential, the trade-offs and possible modifications of these new initialisation methods. △ Less

Submitted 25 October, 2023; originally announced October 2023.

ACM Class: J.3; I.5.1; I.5.4

arXiv:2310.16241 [pdf, other]

Task Grou** for Automated Multi-Task Machine Learning via Task Affinity Prediction

Authors: Afiya Ayman, Ayan Mukhopadhyay, Aron Laszka

Abstract: When a number of similar tasks have to be learned simultaneously, multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. However, the advantage of MTL depends on various factors, such as the similarity of the tasks, the sizes of the datasets, and so on; in fact, some tasks might not benefit from MTL and may even incur a loss of accuracy co… ▽ More When a number of similar tasks have to be learned simultaneously, multi-task learning (MTL) models can attain significantly higher accuracy than single-task learning (STL) models. However, the advantage of MTL depends on various factors, such as the similarity of the tasks, the sizes of the datasets, and so on; in fact, some tasks might not benefit from MTL and may even incur a loss of accuracy compared to STL. Hence, the question arises: which tasks should be learned together? Domain experts can attempt to group tasks together following intuition, experience, and best practices, but manual grou** can be labor-intensive and far from optimal. In this paper, we propose a novel automated approach for task grou**. First, we study the affinity of tasks for MTL using four benchmark datasets that have been used extensively in the MTL literature, focusing on neural network-based MTL models. We identify inherent task features and STL characteristics that can help us to predict whether a group of tasks should be learned together using MTL or if they should be learned independently using STL. Building on this predictor, we introduce a randomized search algorithm, which employs the predictor to minimize the number of MTL trainings performed during the search for task groups. We demonstrate on the four benchmark datasets that our predictor-driven search approach can find better task grou**s than existing baseline approaches. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.00504 [pdf, other]

Exploring SAM Ablations for Enhancing Medical Segmentation in Radiology and Pathology

Authors: Amin Ranem, Niklas Babendererde, Moritz Fuchs, Anirban Mukhopadhyay

Abstract: Medical imaging plays a critical role in the diagnosis and treatment planning of various medical conditions, with radiology and pathology heavily reliant on precise image segmentation. The Segment Anything Model (SAM) has emerged as a promising framework for addressing segmentation challenges across different domains. In this white paper, we delve into SAM, breaking down its fundamental components… ▽ More Medical imaging plays a critical role in the diagnosis and treatment planning of various medical conditions, with radiology and pathology heavily reliant on precise image segmentation. The Segment Anything Model (SAM) has emerged as a promising framework for addressing segmentation challenges across different domains. In this white paper, we delve into SAM, breaking down its fundamental components and uncovering the intricate interactions between them. We also explore the fine-tuning of SAM and assess its profound impact on the accuracy and reliability of segmentation results, focusing on applications in radiology (specifically, brain tumor segmentation) and pathology (specifically, breast cancer segmentation). Through a series of carefully designed experiments, we analyze SAM's potential application in the field of medical imaging. We aim to bridge the gap between advanced segmentation techniques and the demanding requirements of healthcare, shedding light on SAM's transformative capabilities. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.02954 [pdf, other]

M3D-NCA: Robust 3D Segmentation with Built-in Quality Control

Authors: John Kalkhof, Anirban Mukhopadhyay

Abstract: Medical image segmentation relies heavily on large-scale deep learning models, such as UNet-based architectures. However, the real-world utility of such models is limited by their high computational requirements, which makes them impractical for resource-constrained environments such as primary care facilities and conflict zones. Furthermore, shifts in the imaging domain can render these models in… ▽ More Medical image segmentation relies heavily on large-scale deep learning models, such as UNet-based architectures. However, the real-world utility of such models is limited by their high computational requirements, which makes them impractical for resource-constrained environments such as primary care facilities and conflict zones. Furthermore, shifts in the imaging domain can render these models ineffective and even compromise patient safety if such errors go undetected. To address these challenges, we propose M3D-NCA, a novel methodology that leverages Neural Cellular Automata (NCA) segmentation for 3D medical images using n-level patchification. Moreover, we exploit the variance in M3D-NCA to develop a novel quality metric which can automatically detect errors in the segmentation process of NCAs. M3D-NCA outperforms the two magnitudes larger UNet models in hippocampus and prostate segmentation by 2% Dice and can be run on a Raspberry Pi 4 Model B (2GB RAM). This highlights the potential of M3D-NCA as an effective and efficient alternative for medical image segmentation in resource-constrained environments. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.00688 [pdf, other]

Jointly Exploring Client Drift and Catastrophic Forgetting in Dynamic Learning

Authors: Niklas Babendererde, Moritz Fuchs, Camila Gonzalez, Yuri Tolkach, Anirban Mukhopadhyay

Abstract: Federated and Continual Learning have emerged as potential paradigms for the robust and privacy-aware use of Deep Learning in dynamic environments. However, Client Drift and Catastrophic Forgetting are fundamental obstacles to guaranteeing consistent performance. Existing work only addresses these problems separately, which neglects the fact that the root cause behind both forms of performance det… ▽ More Federated and Continual Learning have emerged as potential paradigms for the robust and privacy-aware use of Deep Learning in dynamic environments. However, Client Drift and Catastrophic Forgetting are fundamental obstacles to guaranteeing consistent performance. Existing work only addresses these problems separately, which neglects the fact that the root cause behind both forms of performance deterioration is connected. We propose a unified analysis framework for building a controlled test environment for Client Drift -- by perturbing a defined ratio of clients -- and Catastrophic Forgetting -- by shifting all clients with a particular strength. Our framework further leverages this new combined analysis by generating a 3D landscape of the combined performance impact from both. We demonstrate that the performance drop through Client Drift, caused by a certain share of shifted clients, is correlated to the drop from Catastrophic Forgetting resulting from a corresponding shift strength. Correlation tests between both problems for Computer Vision (CelebA) and Medical Imaging (PESO) support this new perspective, with an average Pearson rank correlation coefficient of over 0.94. Our framework's novel ability of combined spatio-temporal shift analysis allows us to investigate how both forms of distribution shift behave in mixed scenarios, opening a new pathway for better generalization. We show that a combination of moderate Client Drift and Catastrophic Forgetting can even improve the performance of the resulting model (causing a "Generalization Bump") compared to when only one of the shifts occurs individually. We apply a simple and commonly used method from Continual Learning in the federated setting and observe this phenomenon to be reoccurring, leveraging the ability of our framework to analyze existing and novel methods for Federated and Continual Learning. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2309.00060 [pdf, ps, other]

On the Performance of Large Loss Systems with Adaptive Multiserver Jobs

Authors: Samira Ghanbarian, Arpan Mukhopadhyay, Fabrice M. Guillemin, Ravi R. Mazumdar

Abstract: In this paper, we study systems where each job or request can be split into a flexible number of sub-jobs up to a maximum limit. The number of sub-jobs a job is split into depends on the number of available servers found upon its arrival. All sub-jobs of a job are then processed in parallel at different servers leading to a linear speed-up of the job. We refer to such jobs as {\em adaptive multi-s… ▽ More In this paper, we study systems where each job or request can be split into a flexible number of sub-jobs up to a maximum limit. The number of sub-jobs a job is split into depends on the number of available servers found upon its arrival. All sub-jobs of a job are then processed in parallel at different servers leading to a linear speed-up of the job. We refer to such jobs as {\em adaptive multi-server jobs}. We study the problem of optimal assignment of such jobs when each server can process at most one sub-job at any given instant and there is no waiting room in the system. We assume that, upon arrival, a job can only access a randomly sampled subset of $k(n)$ servers from a total of $n$ servers, and the number of sub-jobs is determined based on the number of idle servers within the sampled subset. We analyze the steady-state performance of the system when system load varies according to $λ(n) =1 - βn^{-α}$ for $α\in [0,1)$, and $β\geq 0$. Our interest is to find how large the subset $k(n)$ should be in order to have zero blocking and maximum speed-up in the limit as $n \to \infty$. We first characterize the system's performance when the jobs have access to the full system, i.e., $k(n)=n$. In this setting, we show that the blocking probability approaches to zero at the rate $O(1/\sqrt{n})$ and the mean response time of accepted jobs approaches to its minimum achievable value at rate $O(1/n)$. We then consider the case where the jobs only have access to subset of servers, i.e., $k(n) < n$. We show that as long as $k(n)=ω(n^α)$, the same asymptotic performance can be achieved as in the case with full system access. In particular, for $k(n)=Θ(n^α\log n)$, we show that both the blocking probability and the mean response time approach to their desired limits at rate $O(n^{-(1-α)/2})$. △ Less

Submitted 31 August, 2023; originally announced September 2023.

MSC Class: 60K25; 68M20

arXiv:2308.02587 [pdf, other]

Synthesising Rare Cataract Surgery Samples with Guided Diffusion Models

Authors: Yannik Frisch, Moritz Fuchs, Antoine Sanner, Felix Anton Ucar, Marius Frenzel, Joana Wasielica-Poslednik, Adrian Gericke, Felix Mathias Wagner, Thomas Dratsch, Anirban Mukhopadhyay

Abstract: Cataract surgery is a frequently performed procedure that demands automation and advanced assistance systems. However, gathering and annotating data for training such systems is resource intensive. The publicly available data also comprises severe imbalances inherent to the surgical process. Motivated by this, we analyse cataract surgery video data for the worst-performing phases of a pre-trained… ▽ More Cataract surgery is a frequently performed procedure that demands automation and advanced assistance systems. However, gathering and annotating data for training such systems is resource intensive. The publicly available data also comprises severe imbalances inherent to the surgical process. Motivated by this, we analyse cataract surgery video data for the worst-performing phases of a pre-trained downstream tool classifier. The analysis demonstrates that imbalances deteriorate the classifier's performance on underrepresented cases. To address this challenge, we utilise a conditional generative model based on Denoising Diffusion Implicit Models (DDIM) and Classifier-Free Guidance (CFG). Our model can synthesise diverse, high-quality examples based on complex multi-class multi-label conditions, such as surgical phases and combinations of surgical tools. We affirm that the synthesised samples display tools that the classifier recognises. These samples are hard to differentiate from real images, even for clinical experts with more than five years of experience. Further, our synthetically extended data can improve the data sparsity problem for the downstream task of tool classification. The evaluations demonstrate that the model can generate valuable unseen examples, allowing the tool classifier to improve by up to 10% for rare cases. Overall, our approach can facilitate the development of automated assistance systems for cataract surgery by providing a reliable source of realistic synthetic data, which we make available for everyone. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2306.10068 [pdf, ps, other]

Artificial Intelligence for Emergency Response

Authors: Ayan Mukhopadhyay

Abstract: Emergency response management (ERM) is a challenge faced by communities across the globe. First responders must respond to various incidents, such as fires, traffic accidents, and medical emergencies. They must respond quickly to incidents to minimize the risk to human life. Consequently, considerable attention has been devoted to studying emergency incidents and response in the last several decad… ▽ More Emergency response management (ERM) is a challenge faced by communities across the globe. First responders must respond to various incidents, such as fires, traffic accidents, and medical emergencies. They must respond quickly to incidents to minimize the risk to human life. Consequently, considerable attention has been devoted to studying emergency incidents and response in the last several decades. In particular, data-driven models help reduce human and financial loss and improve design codes, traffic regulations, and safety measures. This tutorial paper explores four sub-problems within emergency response: incident prediction, incident detection, resource allocation, and resource dispatch. We aim to present mathematical formulations for these problems and broad frameworks for each problem. We also share open-source (synthetic) data from a large metropolitan area in the USA for future work on data-driven emergency response. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: This is a pre-print for a book chapter to appear in Vorobeychik, Yevgeniy., and Mukhopadhyay, Ayan., (Eds.). (2023). \textit{Artificial Intelligence and Society}. ACM Press

arXiv:2305.12357 [pdf, other]

doi 10.1145/3563657.3595997

CoSINT: Designing a Collaborative Capture the Flag Competition to Investigate Misinformation

Authors: Sukrit Venkatagiri, Anirban Mukhopadhyay, David Hicks, Aaron Brantly, Kurt Luther

Abstract: Crowdsourced investigations shore up democratic institutions by debunking misinformation and uncovering human rights abuses. However, current crowdsourcing approaches rely on simplistic collaborative or competitive models and lack technological support, limiting their collective impact. Prior research has shown that blending elements of competition and collaboration can lead to greater performance… ▽ More Crowdsourced investigations shore up democratic institutions by debunking misinformation and uncovering human rights abuses. However, current crowdsourcing approaches rely on simplistic collaborative or competitive models and lack technological support, limiting their collective impact. Prior research has shown that blending elements of competition and collaboration can lead to greater performance and creativity, but crowdsourced investigations pose unique analytical and ethical challenges. In this paper, we employed a four-month-long Research through Design process to design and evaluate a novel interaction style called collaborative capture the flag competitions (CoCTFs). We instantiated this interaction style through CoSINT, a platform that enables a trained crowd to work with professional investigators to identify and investigate social media misinformation. Our mixed-methods evaluation showed that CoSINT leverages the complementary strengths of competition and collaboration, allowing a crowd to quickly identify and debunk misinformation. We also highlight tensions between competition versus collaboration and discuss implications for the design of crowdsourced investigations. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: To appear in ACM Designing Interactive Systems 2023 (DIS 2023). To cite this paper please use the official citation available here: https://doi.org/10.1145/3563657.3595997

Journal ref: Designing Interactive Systems Conference 2023

arXiv:2304.12381 [pdf, other]

Recognizing and generating unswitchable graphs

Authors: Asish Mukhopadhyay, Daniel John, Srivatsan Vasudevan

Abstract: In this paper, we show that unswitchable graphs are a proper subclass of split graphs, and exploit this fact to propose efficient algorithms for their recognition and generation. In this paper, we show that unswitchable graphs are a proper subclass of split graphs, and exploit this fact to propose efficient algorithms for their recognition and generation. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 13 pages, 14 figures

arXiv:2303.00869 [pdf, ps, other]

The Power of Two Choices with Load Comparison Errors

Authors: Sanidhay Bhambay, Arpan Mukhopadhyay, Thirupathaiah Vasantam

Abstract: In this paper, we analyze the effects of erroneous load comparisons on the performance of the Po2 scheme. Specifically, we consider load-dependent and load-independent errors. In the load-dependent error model, an incoming job is sent to the server with the larger queue length among the two sampled servers with probability $ε$ if the difference in the queue lengths of the two sampled servers is le… ▽ More In this paper, we analyze the effects of erroneous load comparisons on the performance of the Po2 scheme. Specifically, we consider load-dependent and load-independent errors. In the load-dependent error model, an incoming job is sent to the server with the larger queue length among the two sampled servers with probability $ε$ if the difference in the queue lengths of the two sampled servers is less than or equal to a constant $g$; no error is made if the queue-length difference is higher than $g$. For this type of errors, we show that the benefits of the Po2 scheme is retained as long as the system size is sufficiently large and $λ$ is sufficiently close to $1$. Furthermore, we show that, unlike the standard Po2 scheme, the performance of the Po2 scheme under this type of errors can be worse than the random scheme if $ε> 1/2$ and $λ$ is sufficiently small. In the load-independent error model, the incoming job is sent to the sampled server with the {\em maximum load} with an error probability of $ε$ independent of the loads of the sampled servers. For this model, we show that the performance benefits of the Po2 scheme are retained only if $ε\leq 1/2$; for $ε> 1/2$ we show that the stability region of the system reduces and the system performs poorly in comparison to the {\em random scheme}. △ Less

Submitted 14 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2302.11137 [pdf, other]

Fairguard: Harness Logic-based Fairness Rules in Smart Cities

Authors: Yiqi Zhao, Ziyan An, Xuqing Gao, Ayan Mukhopadhyay, Meiyi Ma

Abstract: Smart cities operate on computational predictive frameworks that collect, aggregate, and utilize data from large-scale sensor networks. However, these frameworks are prone to multiple sources of data and algorithmic bias, which often lead to unfair prediction results. In this work, we first demonstrate that bias persists at a micro-level both temporally and spatially by studying real city data fro… ▽ More Smart cities operate on computational predictive frameworks that collect, aggregate, and utilize data from large-scale sensor networks. However, these frameworks are prone to multiple sources of data and algorithmic bias, which often lead to unfair prediction results. In this work, we first demonstrate that bias persists at a micro-level both temporally and spatially by studying real city data from Chattanooga, TN. To alleviate the issue of such bias, we introduce Fairguard, a micro-level temporal logic-based approach for fair smart city policy adjustment and generation in complex temporal-spatial domains. The Fairguard framework consists of two phases: first, we develop a static generator that is able to reduce data bias based on temporal logic conditions by minimizing correlations between selected attributes. Then, to ensure fairness in predictive algorithms, we design a dynamic component to regulate prediction results and generate future fair predictions by harnessing logic rules. Evaluations show that logic-enabled static Fairguard can effectively reduce the biased correlations while dynamic Fairguard can guarantee fairness on protected groups at run-time with minimal impact on overall performance. △ Less

Submitted 8 September, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2302.09750 [pdf, other]

Dynamic Simplex: Balancing Safety and Performance in Autonomous Cyber Physical Systems

Authors: Baiting Luo, Shreyas Ramakrishna, Ava Pettet, Christopher Kuhn, Gabor Karsai, Ayan Mukhopadhyay

Abstract: Learning Enabled Components (LEC) have greatly assisted cyber-physical systems in achieving higher levels of autonomy. However, LEC's susceptibility to dynamic and uncertain operating conditions is a critical challenge for the safety of these systems. Redundant controller architectures have been widely adopted for safety assurance in such contexts. These architectures augment LEC "performant" cont… ▽ More Learning Enabled Components (LEC) have greatly assisted cyber-physical systems in achieving higher levels of autonomy. However, LEC's susceptibility to dynamic and uncertain operating conditions is a critical challenge for the safety of these systems. Redundant controller architectures have been widely adopted for safety assurance in such contexts. These architectures augment LEC "performant" controllers that are difficult to verify with "safety" controllers and the decision logic to switch between them. While these architectures ensure safety, we point out two limitations. First, they are trained offline to learn a conservative policy of always selecting a controller that maintains the system's safety, which limits the system's adaptability to dynamic and non-stationary environments. Second, they do not support reverse switching from the safety controller to the performant controller, even when the threat to safety is no longer present. To address these limitations, we propose a dynamic simplex strategy with an online controller switching logic that allows two-way switching. We consider switching as a sequential decision-making problem and model it as a semi-Markov decision process. We leverage a combination of a myopic selector using surrogate models (for the forward switch) and a non-myopic planner (for the reverse switch) to balance safety and performance. We evaluate this approach using an autonomous vehicle case study in the CARLA simulator using different driving conditions, locations, and component failures. We show that the proposed approach results in fewer collisions and higher performance than state-of-the-art alternatives. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2302.08344 [pdf, ps, other]

Biased Consensus Dynamics on Regular Expander Graphs

Authors: Oindrila Deb, Arpan Mukhopadhyay

Abstract: Consensus protocols play an important role in the study of distributed algorithms. In this paper, we study the effect of bias on two popular consensus protocols, namely, the {\em voter rule} and the {\em 2-choices rule} with binary opinions. We assume that agents with opinion $1$ update their opinion with a probability $q_1$ strictly less than the probability $q_0$ with which update occurs for age… ▽ More Consensus protocols play an important role in the study of distributed algorithms. In this paper, we study the effect of bias on two popular consensus protocols, namely, the {\em voter rule} and the {\em 2-choices rule} with binary opinions. We assume that agents with opinion $1$ update their opinion with a probability $q_1$ strictly less than the probability $q_0$ with which update occurs for agents with opinion $0$. We call opinion $1$ as the superior opinion and our interest is to study the conditions under which the network reaches consensus on this opinion. We assume that the agents are located on the vertices of a regular expander graph with $n$ vertices. We show that for the voter rule, consensus is achieved on the superior opinion in $O(\log n)$ time with high probability even if system starts with only $Ω(\log n)$ agents having the superior opinion. This is in sharp contrast to the classical voter rule where consensus is achieved in $O(n)$ time and the probability of achieving consensus on any particular opinion is directly proportional to the initial number of agents with that opinion. For the 2-choices rule, we show that consensus is achieved on the superior opinion in $O(\log n)$ time with high probability when the initial proportion of agents with the superior opinion is above a certain threshold. We explicitly characterise this threshold as a function of the strength of the bias and the spectral properties of the graph. We show that for the biased version of the 2-choice rule this threshold can be significantly less than that for the unbiased version of the same rule. Our techniques involve using sharp probabilistic bounds on the drift to characterise the Markovian dynamics of the system. △ Less

Submitted 16 February, 2023; originally announced February 2023.

arXiv:2302.03473 [pdf, other]

Med-NCA: Robust and Lightweight Segmentation with Neural Cellular Automata

Authors: John Kalkhof, Camila González, Anirban Mukhopadhyay

Abstract: Access to the proper infrastructure is critical when performing medical image segmentation with Deep Learning. This requirement makes it difficult to run state-of-the-art segmentation models in resource-constrained scenarios like primary care facilities in rural areas and during crises. The recently emerging field of Neural Cellular Automata (NCA) has shown that locally interacting one-cell models… ▽ More Access to the proper infrastructure is critical when performing medical image segmentation with Deep Learning. This requirement makes it difficult to run state-of-the-art segmentation models in resource-constrained scenarios like primary care facilities in rural areas and during crises. The recently emerging field of Neural Cellular Automata (NCA) has shown that locally interacting one-cell models can achieve competitive results in tasks such as image generation or segmentations in low-resolution inputs. However, they are constrained by high VRAM requirements and the difficulty of reaching convergence for high-resolution images. To counteract these limitations we propose Med-NCA, an end-to-end NCA training pipeline for high-resolution image segmentation. Our method follows a two-step process. Global knowledge is first communicated between cells across the downscaled image. Following that, patch-based segmentation is performed. Our proposed Med-NCA outperforms the classic UNet by 2% and 3% Dice for hippocampus and prostate segmentation, respectively, while also being 500 times smaller. We also show that Med-NCA is by design invariant with respect to image scale, shape and translation, experiencing only slight performance degradation even with strong shifts; and is robust against MRI acquisition artefacts. Med-NCA enables high-resolution medical image segmentation even on a Raspberry Pi B+, arguably the smallest device able to run PyTorch and that can be powered by a standard power bank. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2302.02353 [pdf, other]

Towards Precision in Appearance-based Gaze Estimation in the Wild

Authors: Murthy L. R. D., Abhishek Mukhopadhyay, Shambhavi Aggarwal, Ketan Anand, Pradipta Biswas

Abstract: Appearance-based gaze estimation systems have shown great progress recently, yet the performance of these techniques depend on the datasets used for training. Most of the existing gaze estimation datasets setup in interactive settings were recorded in laboratory conditions and those recorded in the wild conditions display limited head pose and illumination variations. Further, we observed little a… ▽ More Appearance-based gaze estimation systems have shown great progress recently, yet the performance of these techniques depend on the datasets used for training. Most of the existing gaze estimation datasets setup in interactive settings were recorded in laboratory conditions and those recorded in the wild conditions display limited head pose and illumination variations. Further, we observed little attention so far towards precision evaluations of existing gaze estimation approaches. In this work, we present a large gaze estimation dataset, PARKS-Gaze, with wider head pose and illumination variation and with multiple samples for a single Point of Gaze (PoG). The dataset contains 974 minutes of data from 28 participants with a head pose range of 60 degrees in both yaw and pitch directions. Our within-dataset and cross-dataset evaluations and precision evaluations indicate that the proposed dataset is more challenging and enable models to generalize on unseen participants better than the existing in-the-wild datasets. The project page can be accessed here: https://github.com/lrdmurthy/PARKS-Gaze △ Less

Submitted 13 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

arXiv:2301.03281 [pdf, other]

The state-of-the-art 3D anisotropic intracranial hemorrhage segmentation on non-contrast head CT: The INSTANCE challenge

Authors: Xiangyu Li, Gongning Luo, Kuanquan Wang, Hongyu Wang, Jun Liu, Xinjie Liang, Jie Jiang, Zhenghao Song, Chunyue Zheng, Haokai Chi, Mingwang Xu, Yingte He, Xinghua Ma, **gwen Guo, Yifan Liu, Chuanpu Li, Zeli Chen, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Antoine P. Sanner, Anirban Mukhopadhyay, Ahmed E. Othman, Xingyu Zhao, Wei** Liu, **huang Zhang , et al. (9 additional authors not shown)

Abstract: Automatic intracranial hemorrhage segmentation in 3D non-contrast head CT (NCCT) scans is significant in clinical practice. Existing hemorrhage segmentation methods usually ignores the anisotropic nature of the NCCT, and are evaluated on different in-house datasets with distinct metrics, making it highly challenging to improve segmentation performance and perform objective comparisons among differ… ▽ More Automatic intracranial hemorrhage segmentation in 3D non-contrast head CT (NCCT) scans is significant in clinical practice. Existing hemorrhage segmentation methods usually ignores the anisotropic nature of the NCCT, and are evaluated on different in-house datasets with distinct metrics, making it highly challenging to improve segmentation performance and perform objective comparisons among different methods. The INSTANCE 2022 was a grand challenge held in conjunction with the 2022 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). It is intended to resolve the above-mentioned problems and promote the development of both intracranial hemorrhage segmentation and anisotropic data processing. The INSTANCE released a training set of 100 cases with ground-truth and a validation set with 30 cases without ground-truth labels that were available to the participants. A held-out testing set with 70 cases is utilized for the final evaluation and ranking. The methods from different participants are ranked based on four metrics, including Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), Relative Volume Difference (RVD) and Normalized Surface Dice (NSD). A total of 13 teams submitted distinct solutions to resolve the challenges, making several baseline models, pre-processing strategies and anisotropic data processing techniques available to future researchers. The winner method achieved an average DSC of 0.6925, demonstrating a significant growth over our proposed baseline method. To the best of our knowledge, the proposed INSTANCE challenge releases the first intracranial hemorrhage segmentation benchmark, and is also the first challenge that intended to resolve the anisotropic problem in 3D medical image segmentation, which provides new alternatives in these research fields. △ Less

Submitted 12 January, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: Summarized paper for the MICCAI INSTANCE 2022 Challenge

arXiv:2212.12007 [pdf, other]

Designing Equitable Transit Networks

Authors: Sophie Pavia, J. Carlos Martinez Mori, Aryaman Sharma, Philip Pugliese, Abhishek Dubey, Samitha Samaranayake, Ayan Mukhopadhyay

Abstract: Public transit is an essential infrastructure enabling access to employment, healthcare, education, and recreational facilities. While accessibility to transit is important in general, some sections of the population depend critically on transit. However, existing public transit is often not designed equitably, and often, equity is only considered as an additional objective post hoc, which hampers… ▽ More Public transit is an essential infrastructure enabling access to employment, healthcare, education, and recreational facilities. While accessibility to transit is important in general, some sections of the population depend critically on transit. However, existing public transit is often not designed equitably, and often, equity is only considered as an additional objective post hoc, which hampers systemic changes. We present a formulation for transit network design that considers different notions of equity and welfare explicitly. We study the interaction between network design and various concepts of equity and present trade-offs and results based on real-world data from a large metropolitan area in the United States of America. △ Less

Submitted 7 August, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Accepted in the non-archival track at the ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023

arXiv:2211.06295 [pdf]

A novel approach to preventing SARS-CoV-2 transmission in classrooms: An OpenFOAM based CFD Study

Authors: Anish Pal, Riddhideep Biswas, Ritam Pal, Sourav Sarkar, Achintya Mukhopadhyay

Abstract: The education sector has suffered a catastrophic setback due to ongoing COVID-pandemic, with classrooms being closed indefinitely. The current study aims to solve the existing dilemma by examining COVID transmission inside a classroom and providing long-term sustainable solutions. In this work, a standard 5m x 3m x 5m classroom is considered where 24 students are seated, accompanied by a teacher.… ▽ More The education sector has suffered a catastrophic setback due to ongoing COVID-pandemic, with classrooms being closed indefinitely. The current study aims to solve the existing dilemma by examining COVID transmission inside a classroom and providing long-term sustainable solutions. In this work, a standard 5m x 3m x 5m classroom is considered where 24 students are seated, accompanied by a teacher. A computational fluid dynamics simulation based on OpenFOAM is performed using a Eulerian-Lagrangian framework. Based on the stochastic dose response framework, we have evaluated the infection risk in the classroom for two distinct cases: (i) certain students are infected (ii) the teacher is infected. If the teacher is infected, the probability of infection could reach 100% for certain students. When certain students are infected, the maximum infection risk for a susceptible person reaches 30%. The commonly used cloth mask proves to be ineffective in providing protection against infection transmission reducing the maximum infection probability by approximately 26% only. Another commonly used solution in the form of shields installed on desks have also failed to provide adequate protection against infection reducing the infection risk only by 50%. Furthermore, the shields serves as a source of fomite mode of infection. Screens suspended from the ceiling, which entrap droplets, have been proposed as a novel solution that reduces the infection risk by 90% and 95% compared to the no screen scenario besides being completely devoid of fomite infection mode. As a result of the screens, the class-time can be extended by 55 minutes. △ Less

Submitted 12 October, 2022; originally announced November 2022.

arXiv:2210.04989 [pdf, other]

On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data

Authors: Jose Paolo Talusan, Ayan Mukhopadhyay, Dan Freudberg, Abhishek Dubey

Abstract: The ability to accurately predict public transit ridership demand benefits passengers and transit agencies. Agencies will be able to reallocate buses to handle under or over-utilized bus routes, improving resource utilization, and passengers will be able to adjust and plan their schedules to avoid overcrowded buses and maintain a certain level of comfort. However, accurately predicting occupancy i… ▽ More The ability to accurately predict public transit ridership demand benefits passengers and transit agencies. Agencies will be able to reallocate buses to handle under or over-utilized bus routes, improving resource utilization, and passengers will be able to adjust and plan their schedules to avoid overcrowded buses and maintain a certain level of comfort. However, accurately predicting occupancy is a non-trivial task. Various reasons such as heterogeneity, evolving ridership patterns, exogenous events like weather, and other stochastic variables, make the task much more challenging. With the progress of big data, transit authorities now have access to real-time passenger occupancy information for their vehicles. The amount of data generated is staggering. While there is no shortage in data, it must still be cleaned, processed, augmented, and merged before any useful information can be generated. In this paper, we propose the use and fusion of data from multiple sources, cleaned, processed, and merged together, for use in training machine learning models to predict transit ridership. We use data that spans a 2-year period (2020-2022) incorporating transit, weather, traffic, and calendar data. The resulting data, which equates to 17 million observations, is used to train separate models for the trip and stop level prediction. We evaluate our approach on real-world transit data provided by the public transit agency of Nashville, TN. We demonstrate that the trip level model based on Xgboost and the stop level model based on LSTM outperform the baseline statistical model across the entire transit service day. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: 9 pages, 11 figures

arXiv:2210.00520 [pdf, other]

doi 10.1103/PhysRevE.107.064405

Periodic orbits in deterministic discrete-time evolutionary game dynamics: An information-theoretic perspective

Authors: Sayak Bhattacharjee, Vikash Kumar Dubey, Archan Mukhopadhyay, Sagar Chakraborty

Abstract: Even though existence of non-convergent evolution of the states of populations in ecological and evolutionary contexts is an undeniable fact, insightful game-theoretic interpretations of such outcomes are scarce in the literature of evolutionary game theory. As a proof-of-concept, we tap into the information-theoretic concept of relative entropy in order to construct a game-theoretic interpretatio… ▽ More Even though existence of non-convergent evolution of the states of populations in ecological and evolutionary contexts is an undeniable fact, insightful game-theoretic interpretations of such outcomes are scarce in the literature of evolutionary game theory. As a proof-of-concept, we tap into the information-theoretic concept of relative entropy in order to construct a game-theoretic interpretation for periodic orbits in a wide class of deterministic discrete-time evolutionary game dynamics, primarily investigating the two-player two-strategy case. Effectively, we present a consistent generalization of the evolutionarily stable strategy -- the cornerstone of the evolutionary game theory -- and aptly term the generalized concept: information stable orbit. The information stable orbit captures the essence of the evolutionarily stable strategy in that it compares the total payoff obtained against an evolving mutant with the total payoff that the mutant gets while playing against itself. Furthermore, we discuss the connection of the information stable orbit with the dynamical stability of the corresponding periodic orbit. △ Less

Submitted 20 May, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: 12 pages, 3 figures

arXiv:2209.14849 [pdf, other]

doi 10.1007/978-3-031-16434-7_2

Federated Stain Normalization for Computational Pathology

Authors: Nicolas Wagner, Moritz Fuchs, Yuri Tolkach, Anirban Mukhopadhyay

Abstract: Although deep federated learning has received much attention in recent years, progress has been made mainly in the context of natural images and barely for computational pathology. However, deep federated learning is an opportunity to create datasets that reflect the data diversity of many laboratories. Further, the effort of dataset construction can be divided among many. Unfortunately, existing… ▽ More Although deep federated learning has received much attention in recent years, progress has been made mainly in the context of natural images and barely for computational pathology. However, deep federated learning is an opportunity to create datasets that reflect the data diversity of many laboratories. Further, the effort of dataset construction can be divided among many. Unfortunately, existing algorithms cannot be easily applied to computational pathology since previous work presupposes that data distributions of laboratories must be similar. This is an unlikely assumption, mainly since different laboratories have different staining styles. As a solution, we propose BottleGAN, a generative model that can computationally align the staining styles of many laboratories and can be trained in a privacy-preserving manner to foster federated learning in computational pathology. We construct a heterogenic multi-institutional dataset based on the PESO segmentation dataset and improve the IOU by 42\% compared to existing federated learning algorithms. An implementation of BottleGAN is available at https://github.com/MECLabTUDA/BottleGAN △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted for Poster at MICCAI2022

arXiv:2209.09678 [pdf, other]

Detecting respiratory motion artefacts for cardiovascular MRIs to ensure high-quality segmentation

Authors: Amin Ranem, John Kalkhof, Caner Özer, Anirban Mukhopadhyay, Ilkay Oksuz

Abstract: While machine learning approaches perform well on their training domain, they generally tend to fail in a real-world application. In cardiovascular magnetic resonance imaging (CMR), respiratory motion represents a major challenge in terms of acquisition quality and therefore subsequent analysis and final diagnosis. We present a workflow which predicts a severity score for respiratory motion in CMR… ▽ More While machine learning approaches perform well on their training domain, they generally tend to fail in a real-world application. In cardiovascular magnetic resonance imaging (CMR), respiratory motion represents a major challenge in terms of acquisition quality and therefore subsequent analysis and final diagnosis. We present a workflow which predicts a severity score for respiratory motion in CMR for the CMRxMotion challenge 2022. This is an important tool for technicians to immediately provide feedback on the CMR quality during acquisition, as poor-quality images can directly be re-acquired while the patient is still available in the vicinity. Thus, our method ensures that the acquired CMR holds up to a specific quality standard before it is used for further diagnosis. Therefore, it enables an efficient base for proper diagnosis without having time and cost-intensive re-acquisitions in cases of severe motion artefacts. Combined with our segmentation model, this can help cardiologists and technicians in their daily routine by providing a complete pipeline to guarantee proper quality assessment and genuine segmentations for cardiovascular scans. The code base is available at https://github.com/MECLabTUDA/QA_med_data/tree/dev_QA_CMRxMotion. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2208.10270 [pdf, other]

To show or not to show: Redacting sensitive text from videos of electronic displays

Authors: Abhishek Mukhopadhyay, Shubham Agarwal, Patrick Dylan Zwick, Pradipta Biswas

Abstract: With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used wi… ▽ More With the increasing prevalence of video recordings there is a growing need for tools that can maintain the privacy of those recorded. In this paper, we define an approach for redacting personally identifiable text from videos using a combination of optical character recognition (OCR) and natural language processing (NLP) techniques. We examine the relative performance of this approach when used with different OCR models, specifically Tesseract and the OCR system from Google Cloud Vision (GCV). For the proposed approach the performance of GCV, in both accuracy and speed, is significantly higher than Tesseract. Finally, we explore the advantages and disadvantages of both models in real-world applications. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.03217 [pdf, other]

Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentation

Authors: Camila Gonzalez, Karol Gotkowski, Moritz Fuchs, Andreas Bucher, Armin Dadras, Ricarda Fischbach, Isabel Kaltenborn, Anirban Mukhopadhyay

Abstract: Automatic segmentation of ground glass opacities and consolidations in chest computer tomography (CT) scans can potentially ease the burden of radiologists during times of high resource utilisation. However, deep learning models are not trusted in the clinical routine due to failing silently on out-of-distribution (OOD) data. We propose a lightweight OOD detection method that leverages the Mahalan… ▽ More Automatic segmentation of ground glass opacities and consolidations in chest computer tomography (CT) scans can potentially ease the burden of radiologists during times of high resource utilisation. However, deep learning models are not trusted in the clinical routine due to failing silently on out-of-distribution (OOD) data. We propose a lightweight OOD detection method that leverages the Mahalanobis distance in the feature space and seamlessly integrates into state-of-the-art segmentation pipelines. The simple approach can even augment pre-trained models with clinically relevant uncertainty quantification. We validate our method across four chest CT distribution shifts and two magnetic resonance imaging applications, namely segmentation of the hippocampus and the prostate. Our results show that the proposed method effectively detects far- and near-OOD samples across all explored scenarios. △ Less

Submitted 5 August, 2022; originally announced August 2022.

arXiv:2208.03206 [pdf, other]

Task-agnostic Continual Hippocampus Segmentation for Smooth Population Shifts

Authors: Camila Gonzalez, Amin Ranem, Ahmed Othman, Anirban Mukhopadhyay

Abstract: Most continual learning methods are validated in settings where task boundaries are clearly defined and task identity information is available during training and testing. We explore how such methods perform in a task-agnostic setting that more closely resembles dynamic clinical environments with gradual population shifts. We propose ODEx, a holistic solution that combines out-of-distribution dete… ▽ More Most continual learning methods are validated in settings where task boundaries are clearly defined and task identity information is available during training and testing. We explore how such methods perform in a task-agnostic setting that more closely resembles dynamic clinical environments with gradual population shifts. We propose ODEx, a holistic solution that combines out-of-distribution detection with continual learning techniques. Validation on two scenarios of hippocampus segmentation shows that our proposed method reliably maintains performance on earlier tasks without losing plasticity. △ Less

Submitted 5 August, 2022; originally announced August 2022.

arXiv:2208.01871 [pdf, other]

A Deep Learning Approach to Detect Lean Blowout in Combustion Systems

Authors: Tryambak Gangopadhyay, Somnath De, Qisai Liu, Achintya Mukhopadhyay, Swarnendu Sen, Soumik Sarkar

Abstract: Lean combustion is environment friendly with low NOx emissions and also provides better fuel efficiency in a combustion system. However, approaching towards lean combustion can make engines more susceptible to lean blowout. Lean blowout (LBO) is an undesirable phenomenon that can cause sudden flame extinction leading to sudden loss of power. During the design stage, it is quite challenging for the… ▽ More Lean combustion is environment friendly with low NOx emissions and also provides better fuel efficiency in a combustion system. However, approaching towards lean combustion can make engines more susceptible to lean blowout. Lean blowout (LBO) is an undesirable phenomenon that can cause sudden flame extinction leading to sudden loss of power. During the design stage, it is quite challenging for the scientists to accurately determine the optimal operating limits to avoid sudden LBO occurrence. Therefore, it is crucial to develop accurate and computationally tractable frameworks for online LBO detection in low NOx emission engines. To the best of our knowledge, for the first time, we propose a deep learning approach to detect lean blowout in combustion systems. In this work, we utilize a laboratory-scale combustor to collect data for different protocols. We start far from LBO for each protocol and gradually move towards the LBO regime, capturing a quasi-static time series dataset at each condition. Using one of the protocols in our dataset as the reference protocol and with conditions annotated by domain experts, we find a transition state metric for our trained deep learning model to detect LBO in the other test protocols. We find that our proposed approach is more accurate and computationally faster than other baseline models to detect the transitions to LBO. Therefore, we recommend this method for real-time performance monitoring in lean combustion engines. △ Less

Submitted 3 August, 2022; originally announced August 2022.

arXiv:2208.00963 [pdf, other]

FrOoDo: Framework for Out-of-Distribution Detection

Authors: Jonathan Stieber, Moritz Fuchs, Anirban Mukhopadhyay

Abstract: FrOoDo is an easy-to-use and flexible framework for Out-of-Distribution detection tasks in digital pathology. It can be used with PyTorch classification and segmentation models, and its modular design allows for easy extension. The goal is to automate the task of OoD Evaluation such that research can focus on the main goal of either designing new models, new methods or evaluating a new dataset. Th… ▽ More FrOoDo is an easy-to-use and flexible framework for Out-of-Distribution detection tasks in digital pathology. It can be used with PyTorch classification and segmentation models, and its modular design allows for easy extension. The goal is to automate the task of OoD Evaluation such that research can focus on the main goal of either designing new models, new methods or evaluating a new dataset. The code can be found at https://github.com/MECLabTUDA/FrOoDo. △ Less

Submitted 15 February, 2024; v1 submitted 1 August, 2022; originally announced August 2022.

arXiv:2206.14597 [pdf, other]

Generative Anomaly Detection for Time Series Datasets

Authors: Zhuangwei Kang, Ayan Mukhopadhyay, Aniruddha Gokhale, Shijie Wen, Abhishek Dubey

Abstract: Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of mul… ▽ More Traffic congestion anomaly detection is of paramount importance in intelligent traffic systems. The goals of transportation agencies are two-fold: to monitor the general traffic conditions in the area of interest and to locate road segments under abnormal congestion states. Modeling congestion patterns can achieve these goals for citywide roadways, which amounts to learning the distribution of multivariate time series (MTS). However, existing works are either not scalable or unable to capture the spatial-temporal information in MTS simultaneously. To this end, we propose a principled and comprehensive framework consisting of a data-driven generative approach that can perform tractable density estimation for detecting traffic anomalies. Our approach first clusters segments in the feature space and then uses conditional normalizing flow to identify anomalous temporal snapshots at the cluster level in an unsupervised setting. Then, we identify anomalies at the segment level by using a kernel density estimator on the anomalous cluster. Extensive experiments on synthetic datasets show that our approach significantly outperforms several state-of-the-art congestion anomaly detection and diagnosis methods in terms of Recall and F1-Score. We also use the generative model to sample labeled data, which can train classifiers in a supervised setting, alleviating the lack of labeled data for anomaly detection in sparse settings. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: A shorter version of the paper was accepted at the ITSC 2022

arXiv:2204.13663 [pdf, other]

ADVISER: AI-Driven Vaccination Intervention Optimiser for Increasing Vaccine Uptake in Nigeria

Authors: Vineet Nair, Kritika Prakash, Michael Wilbur, Aparna Taneja, Corinne Namblard, Oyindamola Adeyemo, Abhishek Dubey, Abiodun Adereni, Milind Tambe, Ayan Mukhopadhyay

Abstract: More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Niger… ▽ More More than 5 million children under five years die from largely preventable or treatable medical conditions every year, with an overwhelmingly large proportion of deaths occurring in under-developed countries with low vaccination uptake. One of the United Nations' sustainable development goals (SDG 3) aims to end preventable deaths of newborns and children under five years of age. We focus on Nigeria, where the rate of infant mortality is appalling. We collaborate with HelpMum, a large non-profit organization in Nigeria to design and optimize the allocation of heterogeneous health interventions under uncertainty to increase vaccination uptake, the first such collaboration in Nigeria. Our framework, ADVISER: AI-Driven Vaccination Intervention Optimiser, is based on an integer linear program that seeks to maximize the cumulative probability of successful vaccination. Our optimization formulation is intractable in practice. We present a heuristic approach that enables us to solve the problem for real-world use-cases. We also present theoretical bounds for the heuristic method. Finally, we show that the proposed approach outperforms baseline methods in terms of vaccination uptake through experimental evaluation. HelpMum is currently planning a pilot program based on our approach to be deployed in the largest city of Nigeria, which would be the first deployment of an AI-driven vaccination uptake program in the country and hopefully, pave the way for other data-driven programs to improve health outcomes in Nigeria. △ Less

Submitted 5 July, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted for publication at International Joint Conference on Artificial Intelligence 2022, AI for Good Track (IJCAI-22)

arXiv:2204.11992 [pdf, ps, other]

Offline Vehicle Routing Problem with Online Bookings: A Novel Problem Formulation with Applications to Paratransit

Authors: Amutheezan Sivagnanam, Salah Uddin Kadir, Ayan Mukhopadhyay, Philip Pugliese, Abhishek Dubey, Samitha Samaranayake, Aron Laszka

Abstract: Vehicle routing problems (VRPs) can be divided into two major categories: offline VRPs, which consider a given set of trip requests to be served, and online VRPs, which consider requests as they arrive in real-time. Based on discussions with public transit agencies, we identify a real-world problem that is not addressed by existing formulations: booking trips with flexible pickup windows (e.g., 3… ▽ More Vehicle routing problems (VRPs) can be divided into two major categories: offline VRPs, which consider a given set of trip requests to be served, and online VRPs, which consider requests as they arrive in real-time. Based on discussions with public transit agencies, we identify a real-world problem that is not addressed by existing formulations: booking trips with flexible pickup windows (e.g., 3 hours) in advance (e.g., the day before) and confirming tight pickup windows (e.g., 30 minutes) at the time of booking. Such a service model is often required in paratransit service settings, where passengers typically book trips for the next day over the phone. To address this gap between offline and online problems, we introduce a novel formulation, the offline vehicle routing problem with online bookings. This problem is very challenging computationally since it faces the complexity of considering large sets of requests -- similar to offline VRPs -- but must abide by strict constraints on running time -- similar to online VRPs. To solve this problem, we propose a novel computational approach, which combines an anytime algorithm with a learning-based policy for real-time decisions. Based on a paratransit dataset obtained from the public transit agency of Chattanooga, TN, we demonstrate that our novel formulation and computational approach lead to significantly better outcomes in this setting than existing algorithms. △ Less

Submitted 5 May, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

arXiv:2204.08043 [pdf, other]

Continual Hippocampus Segmentation with Transformers

Authors: Amin Ranem, Camila González, Anirban Mukhopadhyay

Abstract: In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks. Yet most existing work focuses on convolutional architectures and image classification. Instead, radiologists prefer to work with segmentation models that outline specific regions-of-interest, for which Transformer-based architect… ▽ More In clinical settings, where acquisition conditions and patient populations change over time, continual learning is key for ensuring the safe use of deep neural networks. Yet most existing work focuses on convolutional architectures and image classification. Instead, radiologists prefer to work with segmentation models that outline specific regions-of-interest, for which Transformer-based architectures are gaining traction. The self-attention mechanism of Transformers could potentially mitigate catastrophic forgetting, opening the way for more robust medical image segmentation. In this work, we explore how recently-proposed Transformer mechanisms for semantic segmentation behave in sequential learning scenarios, and analyse how best to adapt continual learning strategies for this setting. Our evaluation on hippocampus segmentation shows that Transformer mechanisms mitigate catastrophic forgetting for medical image segmentation compared to purely convolutional architectures, and demonstrates that regularising ViT modules should be done with caution. △ Less

Submitted 17 April, 2022; originally announced April 2022.

arXiv:2204.02179 [pdf, other]

Towards Robust and Accurate Myoelectric Controller Design based on Multi-objective Optimization using Evolutionary Computation

Authors: Ahmed Aqeel Shaikh, Anand Kumar Mukhopadhyay, Soumyajit Poddar, Suman Samui

Abstract: Myoelectric pattern recognition is one of the important aspects in the design of the control strategy for various applications including upper-limb prostheses and bio-robotic hand movement systems. The current work has proposed an approach to design an energy-efficient EMG-based controller by considering a kernelized SVM classifier for decoding the information of surface electromyography (sEMG) si… ▽ More Myoelectric pattern recognition is one of the important aspects in the design of the control strategy for various applications including upper-limb prostheses and bio-robotic hand movement systems. The current work has proposed an approach to design an energy-efficient EMG-based controller by considering a kernelized SVM classifier for decoding the information of surface electromyography (sEMG) signals to infer the underlying muscle movements. In order to achieve the optimized performance of the EMG-based controller, our main strategy of classifier design is to reduce the false movements of the overall system (when the EMG-based controller is at the `Rest' position). To this end, we have formulated the training algorithm of the proposed supervised learning system as a general constrained multi-objective optimization problem. An elitist multi-objective evolutionary algorithm $-$ the non-dominated sorting genetic algorithm II (NSGA-II) has been used to tune the hyperparameters of SVM. We have presented the experimental results by performing the experiments on a dataset consisting of the sEMG signals collected from eleven subjects at five different upper limb positions. Furthermore, the performance of the trained models based on the two-objective metrics, namely classification accuracy, and false-negative have been evaluated on two different test sets to examine the generalization capability of the proposed training approach while implementing limb-position invariant EMG classification. It is evident from the presented result that the proposed approach provides much more flexibility to the designer in selecting the parameters of the classifier to optimize the energy efficiency of the EMG-based controller. △ Less

Submitted 22 May, 2023; v1 submitted 2 April, 2022; originally announced April 2022.

Comments: This is the updated paper

arXiv:2203.15127 [pdf, other]

An Online Approach to Solve the Dynamic Vehicle Routing Problem with Stochastic Trip Requests for Paratransit Services

Authors: Michael Wilbur, Salah Uddin Kadir, Youngseo Kim, Geoffrey Pettet, Ayan Mukhopadhyay, Philip Pugliese, Samitha Samaranayake, Aron Laszka, Abhishek Dubey

Abstract: Many transit agencies operating paratransit and microtransit services have to respond to trip requests that arrive in real-time, which entails solving hard combinatorial and sequential decision-making problems under uncertainty. To avoid decisions that lead to significant inefficiency in the long term, vehicles should be allocated to requests by optimizing a non-myopic utility function or by batch… ▽ More Many transit agencies operating paratransit and microtransit services have to respond to trip requests that arrive in real-time, which entails solving hard combinatorial and sequential decision-making problems under uncertainty. To avoid decisions that lead to significant inefficiency in the long term, vehicles should be allocated to requests by optimizing a non-myopic utility function or by batching requests together and optimizing a myopic utility function. While the former approach is typically offline, the latter can be performed online. We point out two major issues with such approaches when applied to paratransit services in practice. First, it is difficult to batch paratransit requests together as they are temporally sparse. Second, the environment in which transit agencies operate changes dynamically (e.g., traffic conditions), causing estimates that are learned offline to become stale. To address these challenges, we propose a fully online approach to solve the dynamic vehicle routing problem (DVRP) with time windows and stochastic trip requests that is robust to changing environmental dynamics by construction. We focus on scenarios where requests are relatively sparse - our problem is motivated by applications to paratransit services. We formulate DVRP as a Markov decision process and use Monte Carlo tree search to evaluate actions for any given state. Accounting for stochastic requests while optimizing a non-myopic utility function is computationally challenging; indeed, the action space for such a problem is intractably large in practice. To tackle the large action space, we leverage the structure of the problem to design heuristics that can sample promising actions for the tree search. Our experiments using real-world data from our partner agency show that the proposed approach outperforms existing state-of-the-art approaches both in terms of performance and robustness. △ Less

Submitted 31 March, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: Accepted for publication at ICCPS 2022

arXiv:2203.08614 [pdf, other]

A Model of Job Parallelism for Latency Reduction in Large-Scale Systems

Authors: Ayalvadi Ganesh, Arpan Mukhopadhyay

Abstract: Processing computation-intensive jobs at multiple processing cores in parallel is essential in many real-world applications. In this paper, we consider an idealised model for job parallelism in which a job can be served simultaneously by $d$ distinct servers. The job is considered complete when the total amount of work done on it by the $d$ servers equals its size. We study the effect of paralleli… ▽ More Processing computation-intensive jobs at multiple processing cores in parallel is essential in many real-world applications. In this paper, we consider an idealised model for job parallelism in which a job can be served simultaneously by $d$ distinct servers. The job is considered complete when the total amount of work done on it by the $d$ servers equals its size. We study the effect of parallelism on the average delay of jobs. Specifically, we analyze a system consisting of $n$ parallel processor sharing servers in which jobs arrive according to a Poisson process of rate $n λ$ ($λ<1$) and each job brings an exponentially distributed amount of work with unit mean. Upon arrival, a job selects $d$ servers uniformly at random and joins all the chosen servers simultaneously. We show by a mean-field analysis that, for fixed $d \geq 2$ and large $n$, the average occupancy of servers is $O(\log (1/(1-λ)))$ as $λ\to 1$ in comparison to $O(1/(1-λ))$ average occupancy for $d=1$. Thus, we obtain an exponential reduction in the response time of jobs through parallelism. We make significant progress towards rigorously justifying the mean-field analysis. △ Less

Submitted 20 July, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

MSC Class: 60K25; 60J28

arXiv:2203.01721 [pdf, other]

Asymptotic Optimality of Speed-Aware JSQ for Heterogeneous Systems

Authors: Sanidhay Bhambay, Arpan Mukhopadhyay

Abstract: The Join-the-Shortest-Queue (JSQ) load-balancing scheme is known to minimise the average delay of jobs in homogeneous systems consisting of identical servers. However, it performs poorly in heterogeneous systems where servers have different processing rates. Finding a delay optimal scheme remains an open problem for heterogeneous systems. In this paper, we consider a speed-aware version of the JSQ… ▽ More The Join-the-Shortest-Queue (JSQ) load-balancing scheme is known to minimise the average delay of jobs in homogeneous systems consisting of identical servers. However, it performs poorly in heterogeneous systems where servers have different processing rates. Finding a delay optimal scheme remains an open problem for heterogeneous systems. In this paper, we consider a speed-aware version of the JSQ scheme for heterogeneous systems and show that it achieves delay optimality in the fluid limit. One of the key issues in establishing this optimality result for heterogeneous systems is to show that the sequence of steady-state distributions indexed by the system size is tight in an appropriately defined space. The usual technique for showing tightness by coupling with a suitably defined dominant system does not work for heterogeneous systems. To prove tightness, we devise a new technique that uses the drift of exponential Lyapunov functions. Using the non-negativity of the drift, we show that the stationary queue length distribution has an exponentially decaying tail - a fact we use to prove tightness. Another technical difficulty arises due to the complexity of the underlying state-space and the separation of two time-scales in the fluid limit. Due to these factors, the fluid-limit turns out to be a function of the invariant distribution of a multi-dimensional Markov chain which is hard to characterise. By using some properties of this invariant distribution and using the monotonicity of the system, we show that the fluid limit is has a unique and globally attractive fixed point. △ Less

Submitted 1 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 36 pages, 3 figures

MSC Class: 60K25 (Primary); 60K30; 68M20 (Secondary)

Showing 1–50 of 122 results for author: Mukhopadhyay, A