-
Machine Unlearning for Recommendation Systems: An Insight
Authors:
Bhavika Sachdeva,
Harshita Rathee,
Sristi,
Arun Sharma,
Witold Wydmański
Abstract:
This review explores machine unlearning (MUL) in recommendation systems, addressing adaptability, personalization, privacy, and bias challenges. Unlike traditional models, MUL dynamically adjusts system knowledge based on shifts in user preferences and ethical considerations. The paper critically examines MUL's basics, real-world applications, and challenges like algorithmic transparency. It sifts…
▽ More
This review explores machine unlearning (MUL) in recommendation systems, addressing adaptability, personalization, privacy, and bias challenges. Unlike traditional models, MUL dynamically adjusts system knowledge based on shifts in user preferences and ethical considerations. The paper critically examines MUL's basics, real-world applications, and challenges like algorithmic transparency. It sifts through literature, offering insights into how MUL could transform recommendations, discussing user trust, and suggesting paths for future research in responsible and user-focused artificial intelligence (AI). The document guides researchers through challenges involving the trade-off between personalization and privacy, encouraging contributions to meet practical demands for targeted data removal. Emphasizing MUL's role in secure and adaptive machine learning, the paper proposes ways to push its boundaries. The novelty of this paper lies in its exploration of the limitations of the methods, which highlights exciting prospects for advancing the field.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Contextual Feature Selection with Conditional Stochastic Gates
Authors:
Ram Dyuthi Sristi,
Ofir Lindenbaum,
Shira Lifshitz,
Maria Lavzin,
Jackie Schiller,
Gal Mishne,
Hadas Benisty
Abstract:
Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for…
▽ More
Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for contextual feature selection where the subset of selected features is conditioned on the value of context variables. Our new approach, Conditional Stochastic Gates (c-STG), models the importance of features using conditional Bernoulli variables whose parameters are predicted based on contextual variables. We introduce a hypernetwork that maps context variables to feature selection parameters to learn the context-dependent gates along with a prediction model. We further present a theoretical analysis of our model, indicating that it can improve performance and flexibility over population-level methods in complex feature selection settings. Finally, we conduct an extensive benchmark using simulated and real-world datasets across multiple domains demonstrating that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability.
△ Less
Submitted 7 June, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
DiSC: Differential Spectral Clustering of Features
Authors:
Ram Dyuthi Sristi,
Gal Mishne,
Ariel Jaffe
Abstract:
Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a…
▽ More
Selecting subsets of features that differentiate between two conditions is a key task in a broad range of scientific domains. In many applications, the features of interest form clusters with similar effects on the data at hand. To recover such clusters we develop DiSC, a data-driven approach for detecting groups of features that differentiate between conditions. For each condition, we construct a graph whose nodes correspond to the features and whose weights are functions of the similarity between them for that condition. We then apply a spectral approach to compute subsets of nodes whose connectivity differs significantly between the condition-specific feature graphs. On the theoretical front, we analyze our approach with a toy example based on the stochastic block model. We evaluate DiSC on a variety of datasets, including MNIST, hyperspectral imaging, simulated scRNA-seq and task fMRI, and demonstrate that DiSC uncovers features that better differentiate between conditions compared to competing methods.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Core-set Selection Using Metrics-based Explanations (CSUME) for multiclass ECG
Authors:
Sagnik Dakshit,
Barbara Mukami Maweu,
Sristi Dakshit,
Balakrishnan Prabhakaran
Abstract:
The adoption of deep learning-based healthcare decision support systems such as the detection of irregular cardiac rhythm is hindered by challenges such as lack of access to quality data and the high costs associated with the collection and annotation of data. The collection and processing of large volumes of healthcare data is a continuous process. The performance of data-hungry Deep Learning mod…
▽ More
The adoption of deep learning-based healthcare decision support systems such as the detection of irregular cardiac rhythm is hindered by challenges such as lack of access to quality data and the high costs associated with the collection and annotation of data. The collection and processing of large volumes of healthcare data is a continuous process. The performance of data-hungry Deep Learning models (DL) is highly dependent on the quantity and quality of the data. While the need for data quantity has been established through research adequately, we show how a selection of good quality data improves deep learning model performance. In this work, we take Electrocardiogram (ECG) data as a case study and propose a model performance improvement methodology for algorithm developers, that selects the most informative data samples from incoming streams of multi-class ECG data. Our Core-Set selection methodology uses metrics-based explanations to select the most informative ECG data samples. This also provides an understanding (for algorithm developers) as to why a sample was selected as more informative over others for the improvement of deep learning model performance. Our experimental results show a 9.67% and 8.69% precision and recall improvement with a significant training data volume reduction of 50%. Additionally, our proposed methodology asserts the quality and annotation of ECG samples from incoming data streams. It allows automatic detection of individual data samples that do not contribute to model learning thus minimizing possible negative effects on model performance. We further discuss the potential generalizability of our approach by experimenting with a different dataset and deep learning architecture.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Looking for ancillary signals around GW150914
Authors:
Rahul Maroju,
Sristi Ram Dyuthi,
Anumandla Sukrutha,
Shantanu Desai
Abstract:
We replicated the procedure in Liu and Jackson (arXiv:1609.08346), who had found evidence for a low amplitude signal in the vicinity of GW150914. This was based upon the large correlation between the time integral of the Pearson cross-correlation coefficient in the off-source region of GW150914, and the Pearson cross-correlation in a narrow window around GW150914, for the same time lag between the…
▽ More
We replicated the procedure in Liu and Jackson (arXiv:1609.08346), who had found evidence for a low amplitude signal in the vicinity of GW150914. This was based upon the large correlation between the time integral of the Pearson cross-correlation coefficient in the off-source region of GW150914, and the Pearson cross-correlation in a narrow window around GW150914, for the same time lag between the two LIGO detectors as the gravitational wave signal. Our results mostly agree with those in arXiv:1609.08346. We find the statistical significance of the observed cross-correlation to be about 2.5 $σ$. We also used the cross-correlation method to search for short duration signals at all other physical values of the time lag, within this 4096 second time interval, but do not find evidence for any statistically significant events in the off-source region.
△ Less
Submitted 26 March, 2019; v1 submitted 7 March, 2019;
originally announced March 2019.
-
Signal Jamming Attacks Against Communication-Based Train Control: Attack Impact and Countermeasure
Authors:
Subhash Lakshminarayana,
Jabir Shabbir Karachiwala,
Sang-Yoon Chang,
Girish Revadigar,
Sristi Lakshmi Sravana Kumar,
David K. Y. Yau,
Yih-Chun Hu
Abstract:
We study the impact of signal jamming attacks against the communication based train control (CBTC) systems and develop the countermeasures to limit the attacks' impact. CBTC supports the train operation automation and moving-block signaling, which improves the transport efficiency. We consider an attacker jamming the wireless communication between the trains or the train to wayside access point, w…
▽ More
We study the impact of signal jamming attacks against the communication based train control (CBTC) systems and develop the countermeasures to limit the attacks' impact. CBTC supports the train operation automation and moving-block signaling, which improves the transport efficiency. We consider an attacker jamming the wireless communication between the trains or the train to wayside access point, which can disable CBTC and the corresponding benefits. In contrast to prior work studying jamming only at the physical or link layer, we study the real impact of such attacks on end users, namely train journey time and passenger congestion. Our analysis employs a detailed model of leaky medium-based communication system (leaky waveguide or leaky feeder/coaxial cable) popularly used in CBTC systems. To counteract the jamming attacks, we develop a mitigation approach based on frequency hop** spread spectrum taking into account domain-specific structure of the leaky-medium CBTC systems. Specifically, compared with existing implementations of FHSS, we apply FHSS not only between the transmitter-receiver pair but also at the track-side repeaters. To demonstrate the feasibility of implementing this technology in CBTC systems, we develop a FHSS repeater prototype using software-defined radios on both leaky-medium and open-air (free-wave) channels. We perform extensive simulations driven by realistic running profiles of trains and real-world passenger data to provide insights into the jamming attack's impact and the effectiveness of the proposed countermeasure.
△ Less
Submitted 5 August, 2018;
originally announced August 2018.
-
Multimodel Response Assessment for Monthly Rainfall Distribution in Some Selected Indian Cities Using Best Fit Probability as a Tool
Authors:
Anumandla Sukrutha,
Sristi Ram Dyuthi,
Shantanu Desai
Abstract:
We carry out a study of the statistical distribution of rainfall precipitation data for 20 cites in India. We have determined the best-fit probability distribution for these cities from the monthly precipitation data spanning 100 years of observations from 1901 to 2002. To fit the observed data, we considered 10 different distributions. The efficacy of the fits for these distributions was evaluate…
▽ More
We carry out a study of the statistical distribution of rainfall precipitation data for 20 cites in India. We have determined the best-fit probability distribution for these cities from the monthly precipitation data spanning 100 years of observations from 1901 to 2002. To fit the observed data, we considered 10 different distributions. The efficacy of the fits for these distributions was evaluated using four empirical non-parametric goodness-of-fit tests namely Kolmogorov-Smirnov, Anderson-Darling, Chi-Square, Akaike information criterion, and Bayesian Information criterion. Finally, the best-fit distribution using each of these tests were reported, by combining the results from the model comparison tests. We then find that for most of the cities, Generalized Extreme-Value Distribution or Inverse Gaussian Distribution most adequately fits the observed data.
△ Less
Submitted 9 August, 2018; v1 submitted 10 August, 2017;
originally announced August 2017.