-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
A novel data generation scheme for surrogate modelling with deep operator networks
Authors:
Shivam Choubey,
Birupaksha Pal,
Manish Agrawal
Abstract:
Operator-based neural network architectures such as DeepONets have emerged as a promising tool for the surrogate modeling of physical systems. In general, towards operator surrogate modeling, the training data is generated by solving the PDEs using techniques such as Finite Element Method (FEM). The computationally intensive nature of data generation is one of the biggest bottleneck in deploying t…
▽ More
Operator-based neural network architectures such as DeepONets have emerged as a promising tool for the surrogate modeling of physical systems. In general, towards operator surrogate modeling, the training data is generated by solving the PDEs using techniques such as Finite Element Method (FEM). The computationally intensive nature of data generation is one of the biggest bottleneck in deploying these surrogate models for practical applications. In this study, we propose a novel methodology to alleviate the computational burden associated with training data generation for DeepONets. Unlike existing literature, the proposed framework for data generation does not use any partial differential equation integration strategy, thereby significantly reducing the computational cost associated with generating training dataset for DeepONet. In the proposed strategy, first, the output field is generated randomly, satisfying the boundary conditions using Gaussian Process Regression (GPR). From the output field, the input source field can be calculated easily using finite difference techniques. The proposed methodology can be extended to other operator learning methods, making the approach widely applicable. To validate the proposed approach, we employ the heat equations as the model problem and develop the surrogate model for numerous boundary value problems.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
SmartMME: Implementation of Base Station Switching Off Strategy in ns-3
Authors:
Argha Sen,
Bhupendra Pal,
Seemant Achari,
Sandip Chakraborty
Abstract:
In the landscape of next-generation cellular networks, a projected surge of over 12 billion subscriptions foreshadows a considerable upswing in the network's overall energy consumption. The proliferation of User Equipment (UE) drives this energy demand, urging 5G deployments to seek more energy-efficient methodologies. In this work, we propose SmartMME, as a pivotal solution aimed at optimizing Ba…
▽ More
In the landscape of next-generation cellular networks, a projected surge of over 12 billion subscriptions foreshadows a considerable upswing in the network's overall energy consumption. The proliferation of User Equipment (UE) drives this energy demand, urging 5G deployments to seek more energy-efficient methodologies. In this work, we propose SmartMME, as a pivotal solution aimed at optimizing Base Station (BS) energy usage. By harnessing and analyzing critical network states-such as UE connections, data traffic at individual UEs, and other pertinent metrics-our methodology intelligently orchestrates the BS's power states, making informed decisions on when to activate or deactivate the BS. This meticulous approach significantly curtails the network's overall energy consumption. In a bid to validate its efficiency, we seamlessly integrated our module into Network Simulator-3 (ns-3), conducting extensive testing to demonstrate its prowess in effectively managing and reducing net energy consumption. As advocates of collaborative progress, we've opted to open-source this module, inviting the engagement and feedback of the wider research community on GitHub.
△ Less
Submitted 12 January, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
A novel framework for generalization of deep hidden physics models
Authors:
Vijay Kag,
Birupaksha Pal
Abstract:
Modelling of systems where the full system information is unknown is an oft encountered problem for various engineering and industrial applications, as it's either impossible to consider all the complex physics involved or simpler models are considered to keep within the limits of the available resources. Recent advances in greybox modelling like the deep hidden physics models address this space b…
▽ More
Modelling of systems where the full system information is unknown is an oft encountered problem for various engineering and industrial applications, as it's either impossible to consider all the complex physics involved or simpler models are considered to keep within the limits of the available resources. Recent advances in greybox modelling like the deep hidden physics models address this space by combining data and physics. However, for most real-life applications, model generalizability is a key issue, as retraining a model for every small change in system inputs and parameters or modification in domain configuration can render the model economically unviable. In this work we present a novel enhancement to the idea of hidden physics models which can generalize for changes in system inputs, parameters and domains. We also show that this approach holds promise in system discovery as well and helps learn the hidden physics for the changed system inputs, parameters and domain configuration.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Gaussian Harmony: Attaining Fairness in Diffusion-based Face Generation Models
Authors:
Basudha Pal,
Arunkumar Kannan,
Ram Prabhakar Kathirvel,
Alice J. O'Toole,
Rama Chellappa
Abstract:
Diffusion models have achieved great progress in face generation. However, these models amplify the bias in the generation process, leading to an imbalance in distribution of sensitive attributes such as age, gender and race. This paper proposes a novel solution to this problem by balancing the facial attributes of the generated images. We mitigate the bias by localizing the means of the facial at…
▽ More
Diffusion models have achieved great progress in face generation. However, these models amplify the bias in the generation process, leading to an imbalance in distribution of sensitive attributes such as age, gender and race. This paper proposes a novel solution to this problem by balancing the facial attributes of the generated images. We mitigate the bias by localizing the means of the facial attributes in the latent space of the diffusion model using Gaussian mixture models (GMM). Our motivation for choosing GMMs over other clustering frameworks comes from the flexible latent structure of diffusion model. Since each sampling step in diffusion models follows a Gaussian distribution, we show that fitting a GMM model helps us to localize the subspace responsible for generating a specific attribute. Furthermore, our method does not require retraining, we instead localize the subspace on-the-fly and mitigate the bias for generating a fair dataset. We evaluate our approach on multiple face attribute datasets to demonstrate the effectiveness of our approach. Our results demonstrate that our approach leads to a more fair data generation in terms of representational fairness while preserving the quality of generated samples.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
A Systematic Study on Object Recognition Using Millimeter-wave Radar
Authors:
Maloy Kumar Devnath,
Avijoy Chakma,
Mohammad Saeid Anwar,
Emon Dey,
Zahid Hasan,
Marc Conn,
Biplab Pal,
Nirmalya Roy
Abstract:
Due to its light and weather-independent sensing, millimeter-wave (MMW) radar is essential in smart environments. Intelligent vehicle systems and industry-grade MMW radars have integrated such capabilities. Industry-grade MMW radars are expensive and hard to get for community-purpose smart environment applications. However, commercially available MMW radars have hidden underpinning challenges that…
▽ More
Due to its light and weather-independent sensing, millimeter-wave (MMW) radar is essential in smart environments. Intelligent vehicle systems and industry-grade MMW radars have integrated such capabilities. Industry-grade MMW radars are expensive and hard to get for community-purpose smart environment applications. However, commercially available MMW radars have hidden underpinning challenges that need to be investigated for tasks like recognizing objects and activities, real-time person tracking, object localization, etc. Image and video data are straightforward to gather, understand, and annotate for such jobs. Image and video data are light and weather-dependent, susceptible to the occlusion effect, and present privacy problems. To eliminate dependence and ensure privacy, commercial MMW radars should be tested. MMW radar's practicality and performance in varied operating settings must be addressed before promoting it. To address the problems, we collected a dataset using Texas Instruments' Automotive mmWave Radar (AWR2944) and reported the best experimental settings for object recognition performance using different deep learning algorithms. Our extensive data gathering technique allows us to systematically explore and identify object identification task problems under cross-ambience conditions. We investigated several solutions and published detailed experimental data.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
Biomedical image analysis competitions: The state of current participation practice
Authors:
Matthias Eisenmann,
Annika Reinke,
Vivienn Weru,
Minu Dietlinde Tizabi,
Fabian Isensee,
Tim J. Adler,
Patrick Godau,
Veronika Cheplygina,
Michal Kozubek,
Sharib Ali,
Anubha Gupta,
Jan Kybic,
Alison Noble,
Carlos Ortiz de Solórzano,
Samiksha Pachade,
Caroline Petitjean,
Daniel Sage,
Donglai Wei,
Elizabeth Wilden,
Deepak Alapatt,
Vincent Andrearczyk,
Ujjwal Baid,
Spyridon Bakas,
Niranjan Balu,
Sophia Bano
, et al. (331 additional authors not shown)
Abstract:
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis,…
▽ More
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
△ Less
Submitted 12 September, 2023; v1 submitted 16 December, 2022;
originally announced December 2022.
-
Might I Get Pwned: A Second Generation Compromised Credential Checking Service
Authors:
Bijeeta Pal,
Mazharul Islam,
Marina Sanusi,
Nick Sullivan,
Luke Valenta,
Tara Whalen,
Christopher Wood,
Thomas Ristenpart,
Rahul Chattejee
Abstract:
Credential stuffing attacks use stolen passwords to log into victim accounts. To defend against these attacks, recently deployed compromised credential checking (C3) services provide APIs that help users and companies check whether a username, password pair is exposed. These services however only check if the exact password is leaked, and therefore do not mitigate credential tweaking attacks - att…
▽ More
Credential stuffing attacks use stolen passwords to log into victim accounts. To defend against these attacks, recently deployed compromised credential checking (C3) services provide APIs that help users and companies check whether a username, password pair is exposed. These services however only check if the exact password is leaked, and therefore do not mitigate credential tweaking attacks - attempts to compromise a user account with variants of a user's leaked passwords. Recent work has shown credential tweaking attacks can compromise accounts quite effectively even when the credential stuffing countermeasures are in place. We initiate work on C3 services that protect users from credential tweaking attacks. The core underlying challenge is how to identify passwords that are similar to their leaked passwords while preserving honest clients' privacy and also preventing malicious clients from extracting breach data from the service. We formalize the problem and explore ways to measure password similarity that balance efficacy, performance, and security. Based on this study, we design "Might I Get Pwned" (MIGP), a new kind of breach alerting service. Our simulations show that MIGP reduces the efficacy of state-of-the-art 1000-guess credential tweaking attacks by 94%. MIGP preserves user privacy and limits potential exposure of sensitive breach entries. We show that the protocol is fast, with response time close to existing C3 services. We worked with Cloudflare to deploy MIGP in practice.
△ Less
Submitted 18 March, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
A Social Distancing-Based Facility Location Approach for Combating COVID-19
Authors:
Suman Banerjee,
Bithika Pal,
Maheswar Singhamahapatra
Abstract:
In this paper, we introduce and study the problem of facility location along with the notion of \emph{`social distancing'}. The input to the problem is the road network of a city where the nodes are the residential zones, edges are the road segments connecting the zones along with their respective distance. We also have the information about the population at each zone, different types of faciliti…
▽ More
In this paper, we introduce and study the problem of facility location along with the notion of \emph{`social distancing'}. The input to the problem is the road network of a city where the nodes are the residential zones, edges are the road segments connecting the zones along with their respective distance. We also have the information about the population at each zone, different types of facilities to be opened and in which number, and their respective demands in each zone. The goal of the problem is to locate the facilities such that the people can be served and at the same time the total social distancing is maximized. We formally call this problem as the \textsc{Social Distancing-Based Facility Location Problem}. We mathematically quantify social distancing for a given allocation of facilities and proposed an optimization model. As the problem is \textsf{NP-Hard}, we propose a simulation-based and heuristic approach for solving this problem. A detailed analysis of both methods has been done. We perform an extensive set of experiments with synthetic datasets. From the results, we observe that the proposed heuristic approach leads to a better allocation compared to the simulation-based approach.
△ Less
Submitted 10 May, 2021;
originally announced May 2021.
-
Budgeted Influence and Earned Benefit Maximization with Tags in Social Networks
Authors:
Suman Banerjee,
Bithika Pal
Abstract:
Given a social network, where each user is associated with a selection cost, the problem of \textsc{Budgeted Influence Maximization} (\emph{BIM Problem} in short) asks to choose a subset of them (known as seed users) within an allocated budget whose initial activation leads to the maximum number of influenced nodes. Existing Studies on this problem do not consider the tag-specific influence probab…
▽ More
Given a social network, where each user is associated with a selection cost, the problem of \textsc{Budgeted Influence Maximization} (\emph{BIM Problem} in short) asks to choose a subset of them (known as seed users) within an allocated budget whose initial activation leads to the maximum number of influenced nodes. Existing Studies on this problem do not consider the tag-specific influence probability. However, in reality, influence probability between two users always depends upon the context (e.g., sports, politics, etc.). To address this issue, in this paper we introduce the \textsc{Tag\mbox{-}Based Budgeted Influence Maximization problem} (\emph{TBIM Problem} in short), where along with the other inputs, a tag set (each of them is also associated with a selection cost) is given, each edge of the network has the tag specific influence probability, and here the goal is to select influential users as well as influential tags within the allocated budget to maximize the influence. Considering the fact that real-world campaigns targeted in nature, we also study the \textsc{Earned Benefit Maximization} Problem in tag specific influence probability setting, which formally we call the \textsc{Tag\mbox{-}Based Earned Benefit Maximization problem} (\emph{TEBM Problem} in short). For this problem along with the inputs of the TBIM Problem, we are given a subset of the nodes as target users, and each one of them is associated with a benefit value that can be earned by influencing them. Considering the fact that different tag has different popularity across the communities of the same network, we propose three methodologies that work based on \emph{effective marginal influence gain computation}. The proposed methodologies have been analyzed for their time and space requirements.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
An Efficient Updation Approach for Enumerating Maximal $(Δ, γ)$\mbox{-}Cliques of a Temporal Network
Authors:
Suman Banerjee,
Bithika Pal
Abstract:
Given a temporal network $\mathcal{G}(\mathcal{V}, \mathcal{E}, \mathcal{T})$, $(\mathcal{X},[t_a,t_b])$ (where $\mathcal{X} \subseteq \mathcal{V}(\mathcal{G})$ and $[t_a,t_b] \subseteq \mathcal{T}$) is said to be a $(Δ, γ)$\mbox{-}clique of $\mathcal{G}$, if for every pair of vertices in $\mathcal{X}$, there must exist at least $γ$ links in each $Δ$ duration within the time interval $[t_a,t_b]$.…
▽ More
Given a temporal network $\mathcal{G}(\mathcal{V}, \mathcal{E}, \mathcal{T})$, $(\mathcal{X},[t_a,t_b])$ (where $\mathcal{X} \subseteq \mathcal{V}(\mathcal{G})$ and $[t_a,t_b] \subseteq \mathcal{T}$) is said to be a $(Δ, γ)$\mbox{-}clique of $\mathcal{G}$, if for every pair of vertices in $\mathcal{X}$, there must exist at least $γ$ links in each $Δ$ duration within the time interval $[t_a,t_b]$. Enumerating such maximal cliques is an important problem in temporal network analysis, as it reveals contact pattern among the nodes of $\mathcal{G}$. In this paper, we study the maximal $(Δ, γ)$\mbox{-}clique enumeration problem in online setting; i.e.; the entire link set of the network is not known in advance, and the links are coming as a batch in an iterative manner. Suppose, the link set till time stamp $T_{1}$ (i.e., $\mathcal{E}^{T_{1}}$), and its corresponding $(Δ, γ)$-clique set are known. In the next batch (till time $T_{2}$), a new set of links (denoted as $\mathcal{E}^{(T_1,T_2]}$) is arrived.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Deceiving computers in Reverse Turing Test through Deep Learning
Authors:
Jimut Bahan Pal
Abstract:
It is increasingly becoming difficult for human beings to work on their day to day life without going through the process of reverse Turing test, where the Computers tests the users to be humans or not. Almost every website and service providers today have the process of checking whether their website is being crawled or not by automated bots which could extract valuable information from their sit…
▽ More
It is increasingly becoming difficult for human beings to work on their day to day life without going through the process of reverse Turing test, where the Computers tests the users to be humans or not. Almost every website and service providers today have the process of checking whether their website is being crawled or not by automated bots which could extract valuable information from their site. In the process the bots are getting more intelligent by the use of Deep Learning techniques to decipher those tests and gain unwanted automated access to data while create nuisance by posting spam. Humans spend a considerable amount of time almost every day when trying to decipher CAPTCHAs. The aim of this investigation is to check whether the use of a subset of commonly used CAPTCHAs, known as the text CAPTCHA is a reliable process for verifying their human customers. We mainly focused on the preprocessing step for every CAPTCHA which converts them in binary intensity and removes the confusion as much as possible and developed various models to correctly label as many CAPTCHAs as possible. We also suggested some ways to improve the process of verifying the humans which makes it easy for humans to solve the existing CAPTCHAs and difficult for bots to do the same.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
First Stretch then Shrink and Bulk: A Two Phase Approach for Enumeration of Maximal $(Δ, γ)$\mbox{-}Cliques of a Temporal Network
Authors:
Suman Banerjee,
Bithika Pal
Abstract:
A \emph{Temporal Network} (also known as \emph{Link Stream} or \emph{Time-Varying Graph}) is often used to model a time-varying relationship among a group of agents. It is typically represented as a collection of triplets of the form $(u,v,t)$ that denotes the interaction between the agents $u$ and $v$ at time $t$. For analyzing the contact patterns of the agents forming a temporal network, recent…
▽ More
A \emph{Temporal Network} (also known as \emph{Link Stream} or \emph{Time-Varying Graph}) is often used to model a time-varying relationship among a group of agents. It is typically represented as a collection of triplets of the form $(u,v,t)$ that denotes the interaction between the agents $u$ and $v$ at time $t$. For analyzing the contact patterns of the agents forming a temporal network, recently the notion of classical \textit{clique} of a \textit{static graph} has been generalized as \textit{$Δ$\mbox{-}Clique} of a Temporal Network. In the same direction, one of our previous studies introduces the notion of \textit{$(Δ, γ)$\mbox{-}Clique}, which is basically a \textit{vertex set}, \textit{time interval} pair, in which every pair of the clique vertices are linked at least $γ$ times in every $Δ$ duration of the time interval. In this paper, we propose a different methodology for enumerating all the maximal $(Δ, γ)$\mbox{-}Cliques of a given temporal network. The proposed methodology is broadly divided into two phases. In the first phase, each temporal link is processed for constructing $(Δ, γ)$\mbox{-}Clique(s) with maximum duration. In the second phase, these initial cliques are expanded by vertex addition to form the maximal cliques. From the experimentation carried out on $5$ real\mbox{-}world temporal network datasets, we observe that the proposed methodology enumerates all the maximal $(Δ,γ)$\mbox{-}Cliques efficiently, particularly when the dataset is sparse. As a special case ($γ=1$), the proposed methodology is also able to enumerate $(Δ,1) \equiv Δ$\mbox{-}cliques with much less time compared to the existing methods.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
DySky: Dynamic Skyline Queries on Uncertain Graphs
Authors:
Suman Banerjee,
Bithika Pal
Abstract:
Given a graph, and a set of query vertices (subset of the vertices), the dynamic skyline query problem returns a subset of data vertices (other than query vertices) which are not dominated by other data vertices based on certain distance measure. In this paper, we study the dynamic skyline query problem on uncertain graphs (DySky). The input to this problem is an uncertain graph, a subset of its n…
▽ More
Given a graph, and a set of query vertices (subset of the vertices), the dynamic skyline query problem returns a subset of data vertices (other than query vertices) which are not dominated by other data vertices based on certain distance measure. In this paper, we study the dynamic skyline query problem on uncertain graphs (DySky). The input to this problem is an uncertain graph, a subset of its nodes as query vertices, and the goal here is to return all the data vertices which are not dominated by others. We employ two distance measures in uncertain graphs, namely, \emph{Majority Distance}, and \emph{Expected Distance}. Our approach is broadly divided into three steps: \emph{Pruning}, \emph{Distance Computation}, and \emph{Skyline Vertex Set Generation}. We implement the proposed methodology with three publicly available datasets and observe that it can find out skyline vertex set without taking much time even for million sized graphs if expected distance is concerned. Particularly, the pruning strategy reduces the computational time significantly.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
How to cluster nearest unique nodes from different classes using JJCluster in Wisp application?
Authors:
Jimut Bahan Pal
Abstract:
The work of finding the best place according to user preference is a tedious task. It needs manual research and lot of intuitive process to find the best location according to some earlier knowledge about the place. It is mainly about accessing publicly available spatial data, applying a simple algorithm to summarize the data according to given preferences, and visualizing the result on a map. We…
▽ More
The work of finding the best place according to user preference is a tedious task. It needs manual research and lot of intuitive process to find the best location according to some earlier knowledge about the place. It is mainly about accessing publicly available spatial data, applying a simple algorithm to summarize the data according to given preferences, and visualizing the result on a map. We introduced JJCluster to eliminate the rigorous way of researching about a place and visualizing the location in real time. This algorithm successfully finds the heart of a city when used in Wisp application. The main purpose of designing Wisp application is used for finding the perfect location for a trip to unknown place which is nearest to a set of preferences. We also discussed the various optimization algorithms that are pioneer of today's dynamic programming and the need for visualization to find patterns when the data is cluttered. Yet, this general clustering algorithm can be used in other areas where we can explore every possible preference to maximize its utility.
△ Less
Submitted 17 February, 2020; v1 submitted 14 February, 2020;
originally announced February 2020.
-
A Deeper Look into Hybrid Images
Authors:
Jimut Bahan Pal
Abstract:
$Hybrid$ $images$ was first introduced by Olivia et al., that produced static images with two interpretations such that the images changes as a function of viewing distance. Hybrid images are built by studying human processing of multiscale images and are motivated by masking studies in visual perception. The first introduction of hybrid images showed that two images can be blend together with a h…
▽ More
$Hybrid$ $images$ was first introduced by Olivia et al., that produced static images with two interpretations such that the images changes as a function of viewing distance. Hybrid images are built by studying human processing of multiscale images and are motivated by masking studies in visual perception. The first introduction of hybrid images showed that two images can be blend together with a high pass filter and a low pass filter in such a way that when the blended image is viewed from a distance, the high pass filter fades away and the low pass filter becomes prominent. Our main aim here is to study and review the original paper by changing and tweaking certain parameters to see how they affect the quality of the blended image produced. We have used exhaustively different set of images and filters to see how they function and whether this can be used in a real time system or not.
△ Less
Submitted 10 February, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers
Authors:
Bijeeta Pal,
Shruti Tople
Abstract:
Transfer learning --- transferring learned knowledge --- has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus…
▽ More
Transfer learning --- transferring learned knowledge --- has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus of datasets have seen ubiquitous adoption in practice. In this paper, we ask, "can transfer learning in text prediction models be exploited to perform misclassification attacks?" As our main contribution, we present novel attack techniques that utilize unintended features learnt in the teacher (public) model to generate adversarial examples for student (downstream) models. To the best of our knowledge, ours is the first work to show that transfer learning from state-of-the-art word-based and sentence-based teacher models increase the susceptibility of student models to misclassification attacks. First, we propose a novel word-score based attack algorithm for generating adversarial examples against student models trained using context-free word-level embedding model. On binary classification tasks trained using the GloVe teacher model, we achieve an average attack accuracy of 97% for the IMDB Movie Reviews and 80% for the Fake News Detection. For multi-class tasks, we divide the Newsgroup dataset into 6 and 20 classes and achieve an average attack accuracy of 75% and 41% respectively. Next, we present length-based and sentence-based misclassification attacks for the Fake News Detection task trained using a context-aware BERT model and achieve 78% and 39% attack accuracy respectively. Thus, our results motivate the need for designing training techniques that are robust to unintended feature learning, specifically for transfer learned models.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Protocols for Checking Compromised Credentials
Authors:
Lucy Li,
Bijeeta Pal,
Junade Ali,
Nick Sullivan,
Rahul Chatterjee,
Thomas Ristenpart
Abstract:
To prevent credential stuffing attacks, industry best practice now proactively checks if user credentials are present in known data breaches. Recently, some web services, such as HaveIBeenPwned (HIBP) and Google Password Checkup (GPC), have started providing APIs to check for breached passwords. We refer to such services as compromised credential checking (C3) services. We give the first formal de…
▽ More
To prevent credential stuffing attacks, industry best practice now proactively checks if user credentials are present in known data breaches. Recently, some web services, such as HaveIBeenPwned (HIBP) and Google Password Checkup (GPC), have started providing APIs to check for breached passwords. We refer to such services as compromised credential checking (C3) services. We give the first formal description of C3 services, detailing different settings and operational requirements, and we give relevant threat models.
One key security requirement is the secrecy of a user's passwords that are being checked. Current widely deployed C3 services have the user share a small prefix of a hash computed over the user's password. We provide a framework for empirically analyzing the leakage of such protocols, showing that in some contexts knowing the hash prefixes leads to a 12x increase in the efficacy of remote guessing attacks. We propose two new protocols that provide stronger protection for users' passwords, implement them, and show experimentally that they remain practical to deploy.
△ Less
Submitted 4 September, 2019; v1 submitted 31 May, 2019;
originally announced May 2019.
-
Threshold-Based Heuristics for Trust Inference in a Social Network
Authors:
Bithika Pal,
Suman Banerjee,
Mamata Jenamani
Abstract:
Trust among the users of a social network plays a pivotal role in item recommendation, particularly for the cold start users. Due to the sparse nature of these networks, trust information between any two users may not be always available. To infer the missing trust values, one well-known approach is path based trust estimation, which suggests a user to believe all of its neighbors in the network.…
▽ More
Trust among the users of a social network plays a pivotal role in item recommendation, particularly for the cold start users. Due to the sparse nature of these networks, trust information between any two users may not be always available. To infer the missing trust values, one well-known approach is path based trust estimation, which suggests a user to believe all of its neighbors in the network. In this context, we propose two threshold-based heuristics to overcome the limitation of computation for the path based trust inference. It uses the propagation phenomena of trust and decides a threshold value to select a subset of users for trust propagation. While the first heuristic creates the inferred network considering only the subset of users, the second one is able to preserve the density of the inferred network coming from all users selection. We implement the heuristics and analyze the inferred networks with two real-world datasets. We observe that the proposed threshold based heuristic can recover up to 70 \% of the paths with much less time compared to its deterministic counterpart. We also show that the heuristic based inferred trust is capable of preserving the recommendation accuracy.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
On the Enumeration of Maximal $(Δ, γ)$-Cliques of a Temporal Network
Authors:
Suman Banerjee,
Bithika Pal
Abstract:
A temporal network is a mathematical way of precisely representing a time varying relationship among a group of agents. In this paper, we introduce the notion of $(Δ, γ)$-Cliques of a temporal network, where every pair of vertices present in the clique communicates atleast $γ$ times in each $Δ$ period within a given time duration. We present an algorithm for enumerating all such maximal cliques pr…
▽ More
A temporal network is a mathematical way of precisely representing a time varying relationship among a group of agents. In this paper, we introduce the notion of $(Δ, γ)$-Cliques of a temporal network, where every pair of vertices present in the clique communicates atleast $γ$ times in each $Δ$ period within a given time duration. We present an algorithm for enumerating all such maximal cliques present in the network. We also implement the proposed algorithm with three human contact network data sets. Based on the obtained results, we analyze the data set on multiple values of $Δ$ and $γ$, which helps in finding out contact groups with different frequencies.
△ Less
Submitted 29 April, 2018;
originally announced April 2018.
-
Analysis of Various Symbol Detection Techniques in Multiple-Input Multiple-Output System (MIMO)
Authors:
Shrikrishan Yadav,
Shuchi Jani,
B. L. Pal
Abstract:
Wireless communication is the fastest growing area of the communication industry. To keep swiftness with the indefinite increase in customers' demands and expectations, and the market competition among companies for the services offered,there is need for higher data rate along with reliable communication at low cost so that the applications can reach all. Until now, many technical challenges remai…
▽ More
Wireless communication is the fastest growing area of the communication industry. To keep swiftness with the indefinite increase in customers' demands and expectations, and the market competition among companies for the services offered,there is need for higher data rate along with reliable communication at low cost so that the applications can reach all. Until now, many technical challenges remain in designing robust and fast wireless systems that deliver the performance necessary to support emerging applications, due to the fact that wireless channels are frequency selective, power-limited, susceptible to noise and interference. Demand for high data rate and increasing applications offered by a wireless device calls for an effective method. Due to limit on the available bandwidth, there is a need for exploiting the available bandwidth in a way so that we get maximum advantage. Multiple-Input Multiple-Output system does exactly this thing by multiplying the data rate without any expansion in the bandwidth. This system utilizes the spatial diversity property of the multi channel system. The reliable transmission requires symbols to be effectively recovered at the receiving end. V-BLAST detection technique is employed for this purpose. This paper depicted the advantages of using multiple antennas by exploiting signal diversity offered by multipath effect and the system offers high spectral efficiency.
△ Less
Submitted 26 April, 2012;
originally announced April 2012.