-
Enhancing Low Resource NER Using Assisting Language And Transfer Learning
Authors:
Maithili Sabane,
Aparna Ranade,
Onkar Litake,
Parth Patil,
Raviraj Joshi,
Dipali Kadam
Abstract:
Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot-filling methods have been widely deployed for popular languages. NER is used in applications such as human resources, customer service, search engines, content classification,…
▽ More
Named Entity Recognition (NER) is a fundamental task in NLP that is used to locate the key information in text and is primarily applied in conversational and search systems. In commercial applications, NER or comparable slot-filling methods have been widely deployed for popular languages. NER is used in applications such as human resources, customer service, search engines, content classification, and academia. In this paper, we draw focus on identifying name entities for low-resource Indian languages that are closely related, like Hindi and Marathi. We use various adaptations of BERT such as baseBERT, AlBERT, and RoBERTa to train a supervised NER model. We also compare multilingual models with monolingual models and establish a baseline. In this work, we show the assisting capabilities of the Hindi and Marathi languages for the NER task. We show that models trained using multiple languages perform better than a single language. However, we also observe that blind mixing of all datasets doesn't necessarily provide improvements and data selection methods may be required.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models
Authors:
Parth Patil,
Aparna Ranade,
Maithili Sabane,
Onkar Litake,
Raviraj Joshi
Abstract:
Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems. It helps us identify key entities in a sentence used for the downstream application. NER or similar slot filling systems for popular languages have been heavily used in commercial applications. In this work, we focus on Marathi, an Indian language, spoken prominently by the people…
▽ More
Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems. It helps us identify key entities in a sentence used for the downstream application. NER or similar slot filling systems for popular languages have been heavily used in commercial applications. In this work, we focus on Marathi, an Indian language, spoken prominently by the people of Maharashtra state. Marathi is a low resource language and still lacks useful NER resources. We present L3Cube-MahaNER, the first major gold standard named entity recognition dataset in Marathi. We also describe the manual annotation guidelines followed during the process. In the end, we benchmark the dataset on different CNN, LSTM, and Transformer based models like mBERT, XLM-RoBERTa, IndicBERT, MahaBERT, etc. The MahaBERT provides the best performance among all the models. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Mono vs Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition
Authors:
Onkar Litake,
Maithili Sabane,
Parth Patil,
Aparna Ranade,
Raviraj Joshi
Abstract:
Named entity recognition (NER) is the process of recognising and classifying important information (entities) in text. Proper nouns, such as a person's name, an organization's name, or a location's name, are examples of entities. The NER is one of the important modules in applications like human resources, customer support, search engines, content classification, and academia. In this work, we con…
▽ More
Named entity recognition (NER) is the process of recognising and classifying important information (entities) in text. Proper nouns, such as a person's name, an organization's name, or a location's name, are examples of entities. The NER is one of the important modules in applications like human resources, customer support, search engines, content classification, and academia. In this work, we consider NER for low-resource Indian languages like Hindi and Marathi. The transformer-based models have been widely used for NER tasks. We consider different variations of BERT like base-BERT, RoBERTa, and AlBERT and benchmark them on publicly available Hindi and Marathi NER datasets. We provide an exhaustive comparison of different monolingual and multilingual transformer-based models and establish simple baselines currently missing in the literature. We show that the monolingual MahaRoBERTa model performs the best for Marathi NER whereas the multilingual XLM-RoBERTa performs the best for Hindi NER. We also perform cross-language evaluation and present mixed observations.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
A self adjusting multirate algorithm based on the TR-BDF2 method
Authors:
Luca Bonaventura,
Francesco Casella,
Ludovica Delpopolo,
Akshay Ranade
Abstract:
We propose a self adjusting multirate method based on the TR-BDF2 solver. The potential advantages of using TR-BDF2 as the key component of a multirate framework are highlighted. A linear stability analysis of the resulting approach is presented and the stability features of the resulting algorithm are analysed. The analysis framework is completely general and allows to study along the same lines…
▽ More
We propose a self adjusting multirate method based on the TR-BDF2 solver. The potential advantages of using TR-BDF2 as the key component of a multirate framework are highlighted. A linear stability analysis of the resulting approach is presented and the stability features of the resulting algorithm are analysed. The analysis framework is completely general and allows to study along the same lines the stability of self adjusting multirate methods based on a generic one step solver. A number of numerical experiments demonstrate the efficiency and accuracy of the resulting approach also the time discretization of hyperbolic partial differential equations.
△ Less
Submitted 27 January, 2018;
originally announced January 2018.
-
Component Coloring of Proper Interval Graphs and Split Graphs
Authors:
Ajit Diwan,
Soumitra Pal,
Abhiram Ranade
Abstract:
We introduce a generalization of the well known graph (vertex) coloring problem, which we call the problem of \emph{component coloring of graphs}. Given a graph, the problem is to color the vertices using minimum number of colors so that the size of each connected component of the subgraph induced by the vertices of the same color does not exceed $C$. We give a linear time algorithm for the proble…
▽ More
We introduce a generalization of the well known graph (vertex) coloring problem, which we call the problem of \emph{component coloring of graphs}. Given a graph, the problem is to color the vertices using minimum number of colors so that the size of each connected component of the subgraph induced by the vertices of the same color does not exceed $C$. We give a linear time algorithm for the problem on proper interval graphs. We extend this algorithm to solve two weighted versions of the problem in which vertices have integer weights. In the \emph{splittable} version the weights of vertices can be split into differently colored parts, however, the total weight of a monochromatic component cannot exceed $C$. For this problem on proper interval graphs we give a polynomial time algorithm. In the \emph{non-splittable} version the vertices cannot be split. Using the algorithm for the splittable version we give a 2-approximation algorithm for the non-splittable problem on proper interval graphs which is NP-hard. We also prove that even the unweighted version of the problem is NP-hard for split graphs.
△ Less
Submitted 3 November, 2012; v1 submitted 16 January, 2012;
originally announced January 2012.
-
Scheduling Light-trails in WDM Rings
Authors:
Soumitra Pal,
Abhiram Ranade
Abstract:
We consider the problem of scheduling communication on optical WDM (wavelength division multiplexing) networks using the light-trails technology. We seek to design scheduling algorithms such that the given transmission requests can be scheduled using minimum number of wavelengths (optical channels). We provide algorithms and close lower bounds for two versions of the problem on an $n$ processor li…
▽ More
We consider the problem of scheduling communication on optical WDM (wavelength division multiplexing) networks using the light-trails technology. We seek to design scheduling algorithms such that the given transmission requests can be scheduled using minimum number of wavelengths (optical channels). We provide algorithms and close lower bounds for two versions of the problem on an $n$ processor linear array/ring network. In the {\em stationary} version, the pattern of transmissions (given) is assumed to not change over time. For this, a simple lower bound is $c$, the congestion or the maximum total traffic required to pass through any link. We give an algorithm that schedules the transmissions using $O(c+\log{n})$ wavelengths. We also show a pattern for which $Ω(c+\log{n}/\log\log{n})$ wavelengths are needed. In the {\em on-line} version, the transmissions arrive and depart dynamically, and must be scheduled without upsetting the previously scheduled transmissions. For this case we give an on-line algorithm which has competitive ratio $Θ(\log{n})$. We show that this is optimal in the sense that every on-line algorithm must have competitive ratio $Ω(\log{n})$. We also give an algorithm that appears to do well in simulation (for the classes of traffic we consider), but which has competitive ratio between $Ω(\log^2n/\log \log{n})$ and $O(\log^2n)$. We present detailed simulations of both our algorithms.
△ Less
Submitted 29 December, 2011;
originally announced December 2011.
-
A Near-Infrared Stellar Spectral Library: III. J-Band Spectra
Authors:
Arvind C. Ranade,
N. M. Ashok,
Harinder P. Singh,
Ranjan Gupta
Abstract:
This paper is the third in the series of papers published on near-infrared (NIR) stellar spectral library by Ranade et al. (2004 & 2007). The observations were carried out with 1.2 meter Gurushikhar Infrared Telescope (GIRT), at Mt. Abu, India using a NICMOS3 HgCdTe $256 \times 256$ NIR array based spectrometer. In paper I (Ranade et al. 2004), H-band spectra of 135 stars at a resolution of…
▽ More
This paper is the third in the series of papers published on near-infrared (NIR) stellar spectral library by Ranade et al. (2004 & 2007). The observations were carried out with 1.2 meter Gurushikhar Infrared Telescope (GIRT), at Mt. Abu, India using a NICMOS3 HgCdTe $256 \times 256$ NIR array based spectrometer. In paper I (Ranade et al. 2004), H-band spectra of 135 stars at a resolution of $\sim 16$Å& paper II (Ranade et al. 2007), K band spectra of 114 stars at a resolution of $\sim 22$Åwere presented. The J-band library being released now consists of 126 stars covering spectral types O5--M8 and luminosity classes I--V. The spectra have a moderate resolution of $\sim 12.5$Åin the J band and have been continuum shape corrected to their respective effective temperatures. The complete set of library in near-infrared (NIR) will serve as a good database for researchers working in the field of stellar population synthesis. The complete library in J, H & K is available online at: http://vo.iucaa.ernet.in/$\sim$voi/NIR\_Header.html
△ Less
Submitted 28 September, 2007;
originally announced September 2007.
-
A Near-Infrared Stellar Spectral Library: II. K-Band Spectra
Authors:
Arvind C. Ranade,
Harinder P. Singh,
Ranjan Gupta,
N. M. Ashok
Abstract:
This paper is the second in the series of papers on near-infrared (NIR) stellar spectral library produced by reducing the observations carried out with 1.2 meter Gurushikhar Infrared Telescope (GIRT), at Mt. Abu, India using a NICMOS3 HgCdTe 256 X 256 NIR array based spectrometer. In paper I (Ranade et al. 2004), H-band spectra of 135 stars at a resolution of ~16 Ang were presented. The K-band l…
▽ More
This paper is the second in the series of papers on near-infrared (NIR) stellar spectral library produced by reducing the observations carried out with 1.2 meter Gurushikhar Infrared Telescope (GIRT), at Mt. Abu, India using a NICMOS3 HgCdTe 256 X 256 NIR array based spectrometer. In paper I (Ranade et al. 2004), H-band spectra of 135 stars at a resolution of ~16 Ang were presented. The K-band library being released now consists of 114 stars covering spectral types O7--M7 and luminosity classes I--V. The spectra have a moderate resolution of ~22 Ang in the K band and have been continuum shape corrected to their respective effective temperatures. We hope to release the remaining J-band spectra soon. The complete H and K-Band library is available online at: http://vo.iucaa.ernet.in/~voi/NIR_Header.html
△ Less
Submitted 31 May, 2007;
originally announced May 2007.