-
Algorithms and Improved bounds for online learning under finite hypothesis class
Authors:
Ankit Sharma,
Late C. A. Murthy
Abstract:
Online learning is the process of answering a sequence of questions based on the correct answers to the previous questions. It is studied in many research areas such as game theory, information theory and machine learning. There are two main components of online learning framework. First, the learning algorithm also known as the learner and second, the hypothesis class which is essentially a set o…
▽ More
Online learning is the process of answering a sequence of questions based on the correct answers to the previous questions. It is studied in many research areas such as game theory, information theory and machine learning. There are two main components of online learning framework. First, the learning algorithm also known as the learner and second, the hypothesis class which is essentially a set of functions which learner uses to predict answers to the questions. Sometimes, this class contains some functions which have the capability to provide correct answers to the entire sequence of questions. This case is called realizable case. And when hypothesis class does not contain such functions is called unrealizable case. The goal of the learner, in both the cases, is to make as few mistakes as that could have been made by most powerful functions in hypothesis class over the entire sequence of questions. Performance of the learners is analysed by theoretical bounds on the number of mistakes made by them. This paper proposes three algorithms to improve the mistakes bound in the unrealizable case. Proposed algorithms perform highly better than the existing ones in the long run when most of the input sequences presented to the learner are likely to be realizable.
△ Less
Submitted 24 March, 2019;
originally announced March 2019.
-
RelDenClu: A Relative Density based Biclustering Method for identifying non-linear feature relations
Authors:
Namita Jain,
Susmita Ghosh,
C. A. Murthy
Abstract:
The existing biclustering algorithms for finding feature relation based biclusters often depend on assumptions like monotonicity or linearity. Though a few algorithms overcome this problem by using density-based methods, they tend to miss out many biclusters because they use global criteria for identifying dense regions. The proposed method, RelDenClu uses the local variations in marginal and join…
▽ More
The existing biclustering algorithms for finding feature relation based biclusters often depend on assumptions like monotonicity or linearity. Though a few algorithms overcome this problem by using density-based methods, they tend to miss out many biclusters because they use global criteria for identifying dense regions. The proposed method, RelDenClu uses the local variations in marginal and joint densities for each pair of features to find the subset of observations, which forms the bases of the relation between them. It then finds the set of features connected by a common set of observations, resulting in a bicluster.
To show the effectiveness of the proposed methodology, experimentation has been carried out on fifteen types of simulated datasets. Further, it has been applied to six real-life datasets. For three of these real-life datasets, the proposed method is used for unsupervised learning, while for other three real-life datasets it is used as an aid to supervised learning. For all the datasets the performance of the proposed method is compared with that of seven different state-of-the-art algorithms and the proposed algorithm is seen to produce better results. The efficacy of proposed algorithm is also seen by its use on COVID-19 dataset for identifying some features (genetic, demographics and others) that are likely to affect the spread of COVID-19.
△ Less
Submitted 11 May, 2021; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Sparsity Measure of a Network Graph: Gini Index
Authors:
Swati Goswami,
C. A. Murthy,
Asit K. Das
Abstract:
This article examines the application of a popular measure of sparsity, Gini Index, on network graphs. A wide variety of network graphs happen to be sparse. But the index with which sparsity is commonly measured in network graphs is edge density, reflecting the proportion of the sum of the degrees of all nodes in the graph compared to the total possible degrees in the corresponding fully connected…
▽ More
This article examines the application of a popular measure of sparsity, Gini Index, on network graphs. A wide variety of network graphs happen to be sparse. But the index with which sparsity is commonly measured in network graphs is edge density, reflecting the proportion of the sum of the degrees of all nodes in the graph compared to the total possible degrees in the corresponding fully connected graph. Thus edge density is a simple ratio and carries limitations, primarily in terms of the amount of information it takes into account in its definition. In this paper, we have provided a formulation for defining sparsity of a network graph by generalizing the concept of Gini Index and call it sparsity index. A majority of the six properties (viz., Robin Hood, Scaling, Rising Tide, Cloning, Bill Gates and Babies) with which sparsity measures are commonly compared are seen to be satisfied by the proposed index. A comparison between edge density and the sparsity index has been drawn with appropriate examples to highlight the efficacy of the proposed index. It has also been shown theoretically that the two measures follow similar trend for a changing graph, i.e., as the edge density of a graph increases its sparsity index decreases. Additionally, the paper draws a relationship, analytically, between the sparsity index and the exponent term of a power law distribution, a distribution which is known to approximate the degree distribution of a wide variety of network graphs. Finally, the article highlights how the proposed index together with Gini index can reveal important properties of a network graph.
△ Less
Submitted 21 December, 2016;
originally announced December 2016.
-
A new estimate of mutual information based measure of dependence between two variables: properties and fast implementation
Authors:
Namita Jain,
C. A. Murthy
Abstract:
This article proposes a new method to estimate an existing mutual information based dependence measure using histogram density estimates. Finding a suitable bin length for histogram is an open problem. We propose a new way of computing the bin length for histogram using a function of maximum separation between points. The chosen bin length leads to consistent density estimates for histogram method…
▽ More
This article proposes a new method to estimate an existing mutual information based dependence measure using histogram density estimates. Finding a suitable bin length for histogram is an open problem. We propose a new way of computing the bin length for histogram using a function of maximum separation between points. The chosen bin length leads to consistent density estimates for histogram method. The values of density thus obtained are used to calculate an estimate of an existing dependence measure. The proposed estimate is named as Mutual Information Based Dependence Index (MIDI). Some important properties of MIDI have also been stated. The performance of the proposed method has been compared to generally accepted measures like Distance Correlation (dcor), Maximal Information Coefficient (MINE) in terms of accuracy and computational complexity with the help of several artificial data sets with different amounts of noise. The proposed method is able to detect many types of relationships between variables, without making any assumption about the functional form of the relationship. The power statistics of proposed method illustrate their effectiveness in detecting non linear relationship. Thus, it is able to achieve generality without a high rate of false positive cases. MIDI is found to work better on a real life data set than competing methods. The proposed method is found to overcome some of the limitations which occur with dcor and MINE. Computationally, MIDI is found to be better than dcor and MINE, in terms of time and memory, making it suitable for large data sets.
△ Less
Submitted 13 September, 2015; v1 submitted 28 October, 2014;
originally announced November 2014.