-
Magnetic effects of non-magnetic impurities in gapped short-range resonating valence bond spin liquids
Authors:
Md Zahid Ansari,
Kedar Damle
Abstract:
We study the effect of a small density $n_v$ of quenched non-magnetic impurities, i.e. vacancy disorder, in gapped short-range resonating valence bond (RVB) spin liquid states and valence bond solid (VBS) states of quantum magnets. We argue that a large class of short-range RVB liquids are stable at small $n_v$ on the kagome lattice, while the corresponding states on triangular, square, and honeyc…
▽ More
We study the effect of a small density $n_v$ of quenched non-magnetic impurities, i.e. vacancy disorder, in gapped short-range resonating valence bond (RVB) spin liquid states and valence bond solid (VBS) states of quantum magnets. We argue that a large class of short-range RVB liquids are stable at small $n_v$ on the kagome lattice, while the corresponding states on triangular, square, and honeycomb lattices are unstable at any nonzero $n_v$ due to the presence of emergent vacancy-induced local moments. In contrast, VBS states are argued to be generically unstable (independent of lattice geometry) at nonzero $n_v$ due to such a local-moment instability. Our arguments rely in part on an analysis of the statistical mechanics of maximally-packed dimer covers of the diluted lattice, and are fully supported by our computational results on $O(N)$ symmetric designer Hamiltonians.
△ Less
Submitted 12 February, 2024; v1 submitted 13 January, 2024;
originally announced January 2024.
-
Efficient Subgraph Isomorphism Finding in Large Graphs using Eccentricity and Limiting Recursive Calls
Authors:
Zubair Ali Ansari,
Muhammad Abulaish,
Irfan Rashid Thoker,
Jahiruddin
Abstract:
The subgraph isomorphism finding problem is a well-studied problem in the field of computer science and graph theory, and it aims to enumerate all instances of a query graph in the respective data graph. In this paper, we propose an efficient method, SubISO, to find subgraph isomorphisms using an objective function, which exploits some isomorphic invariants and eccentricity of the query graph's ve…
▽ More
The subgraph isomorphism finding problem is a well-studied problem in the field of computer science and graph theory, and it aims to enumerate all instances of a query graph in the respective data graph. In this paper, we propose an efficient method, SubISO, to find subgraph isomorphisms using an objective function, which exploits some isomorphic invariants and eccentricity of the query graph's vertices. The proposed objective function is used to determine pivot vertex, which minimizes both number and size of the candidate regions in the data graph. SubISO also limits the maximum recursive calls of the generic SubgraphSearch function to deal with straggler queries for which most of the existing algorithms show exponential behaviour. The proposed approach is evaluated over three benchmark datasets. It is also compared with three well known subgraph isomorphism finding algorithms in terms of execution time, number of identified embeddings, and ability to deal with the straggler queries, and it performs significantly better.
△ Less
Submitted 21 November, 2023;
originally announced December 2023.
-
Inferring properties of dust in supernovae with neural networks
Authors:
Zoe Ansari,
Christa Gall,
Roger Wesson,
Oswin Krause
Abstract:
Context. Determining properties of dust formed in and around supernovae from observations remains challenging. This may be due to either incomplete coverage of data in wavelength or time but also due to often inconspicuous signatures of dust in the observed data. Aims. Here we address this challenge using modern machine learning methods to determine the amount, composition and temperature of dust…
▽ More
Context. Determining properties of dust formed in and around supernovae from observations remains challenging. This may be due to either incomplete coverage of data in wavelength or time but also due to often inconspicuous signatures of dust in the observed data. Aims. Here we address this challenge using modern machine learning methods to determine the amount, composition and temperature of dust from a large set of simulated data. We aim to determine whether such methods are suitable to infer these properties from future observations of supernovae. Methods. We calculate spectral energy distributions (SEDs) of dusty shells around supernovae. We develop a neural network consisting of eight fully connected layers and an output layer with specified activation functions that allow us to predict the dust mass, temperature and composition and their respective uncertainties from each SED. We conduct a feature importance analysis via SHapley Additive exPlanations (SHAP) to find the minimum set of JWST filters required to accurately predict these properties. Results. We find that our neural network predicts dust masses and temperatures with a root-mean-square error (RMSE) of $\sim$ 0.12 dex and $\sim$ 38 K, respectively. Moreover, our neural network can well distinguish between the different dust species included in our work, reaching a classification accuracy of up to 95\% for carbon and 99\% for silicate dust. Conclusions. Our analysis shows that the JWST filters NIRCam F070W, F140M, F356W, F480M and MIRI F560W, F770W, F1000W, F1130W, F1500W, F1800W are likely the most important needed to determine the properties of dust formed in and around supernovae from future observations. We tested this on selected optical to infrared data of SN 1987A at 615 days past explosion and find good agreement with dust masses and temperatures inferred with standard fitting methods in the literature.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Heterogeneous Reservoir Computing Models for Persian Speech Recognition
Authors:
Zohreh Ansari,
Farzin Pourhoseini,
Fatemeh Hadaeghi
Abstract:
Over the last decade, deep-learning methods have been gradually incorporated into conventional automatic speech recognition (ASR) frameworks to create acoustic, pronunciation, and language models. Although it led to significant improvements in ASRs' recognition accuracy, due to their hard constraints related to hardware requirements (e.g., computing power and memory usage), it is unclear if such a…
▽ More
Over the last decade, deep-learning methods have been gradually incorporated into conventional automatic speech recognition (ASR) frameworks to create acoustic, pronunciation, and language models. Although it led to significant improvements in ASRs' recognition accuracy, due to their hard constraints related to hardware requirements (e.g., computing power and memory usage), it is unclear if such approaches are the most computationally- and energy-efficient options for embedded ASR applications. Reservoir computing (RC) models (e.g., echo state networks (ESNs) and liquid state machines (LSMs)), on the other hand, have been proven inexpensive to train, have vastly fewer parameters, and are compatible with emergent hardware technologies. However, their performance in speech processing tasks is relatively inferior to that of the deep-learning-based models. To enhance the accuracy of the RC in ASR applications, we propose heterogeneous single and multi-layer ESNs to create non-linear transformations of the inputs that capture temporal context at different scales. To test our models, we performed a speech recognition task on the Farsdat Persian dataset. Since, to the best of our knowledge, standard RC has not yet been employed to conduct any Persian ASR tasks, we also trained conventional single-layer and deep ESNs to provide baselines for comparison. Besides, we compared the RC performance with a standard long-short-term memory (LSTM) model. Heterogeneous RC models (1) show improved performance to the standard RC models; (2) perform on par in terms of recognition accuracy with the LSTM, and (3) reduce the training time considerably.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
Language Identification of Hindi-English tweets using code-mixed BERT
Authors:
Mohd Zeeshan Ansari,
M M Sufyan Beg,
Tanvir Ahmad,
Mohd Jazib Khan,
Ghazali Wasim
Abstract:
Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, th…
▽ More
Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, the pretrained language models are even more beneficial for learning common language representations. Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper. The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Language Lexicons for Hindi-English Multilingual Text Processing
Authors:
Mohd Zeeshan Ansari,
Tanvir Ahmad,
Noaima Bari
Abstract:
Language Identification in textual documents is the process of automatically detecting the language contained in a document based on its content. The present Language Identification techniques presume that a document contains text in one of the fixed set of languages, however, this presumption is incorrect when dealing with multilingual document which includes content in more than one possible lan…
▽ More
Language Identification in textual documents is the process of automatically detecting the language contained in a document based on its content. The present Language Identification techniques presume that a document contains text in one of the fixed set of languages, however, this presumption is incorrect when dealing with multilingual document which includes content in more than one possible language. Due to the unavailability of large standard corpora for Hindi-English mixed lingual language processing tasks we propose the language lexicons, a novel kind of lexical database that supports several multilingual language processing tasks. These lexicons are built by learning classifiers over transliterated Hindi and English vocabulary. The designed lexicons possess richer quantitative characteristic than its primary source of collection which is revealed using the visualization techniques.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
A Simple and Efficient Probabilistic Language model for Code-Mixed Text
Authors:
M Zeeshan Ansari,
Tanvir Ahmad,
M M Sufyan Beg,
Asma Ikram
Abstract:
The conventional natural language processing approaches are not accustomed to the social media text due to colloquial discourse and non-homogeneous characteristics. Significantly, the language identification in a multilingual document is ascertained to be a preceding subtask in several information extraction applications such as information retrieval, named entity recognition, relation extraction,…
▽ More
The conventional natural language processing approaches are not accustomed to the social media text due to colloquial discourse and non-homogeneous characteristics. Significantly, the language identification in a multilingual document is ascertained to be a preceding subtask in several information extraction applications such as information retrieval, named entity recognition, relation extraction, etc. The problem is often more challenging in code-mixed documents wherein foreign languages words are drawn into base language while framing the text. The word embeddings are powerful language modeling tools for representation of text documents useful in obtaining similarity between words or documents. We present a simple probabilistic approach for building efficient word embedding for code-mixed text and exemplifying it over language identification of Hindi-English short test messages scrapped from Twitter. We examine its efficacy for the classification task using bidirectional LSTMs and SVMs and observe its improved scores over various existing code-mixed embeddings
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
The Early Phases of Supernova 2020pni: Shock-Ionization of the Nitrogen-Enriched Circumstellar Material
Authors:
G. Terreran,
W. V. Jacobson-Galan,
J. H. Groh,
R. Margutti,
D. L. Coppejans,
G. Dimitriadis,
C. D. Kilpatrick,
D. J. Matthews,
M. R. Siebert,
C. R. Angus,
T. G. Brink,
A. V. Filippenko,
R. J. Foley,
D. O. Jones,
S. Tinyanont,
C. Gall,
H. Pfister,
Y. Zenati,
Z. Ansari,
K. Auchettl,
K. El-Badry,
E. A. Magnier,
W. Zheng
Abstract:
We present multiwavelength observations of the Type II SN 2020pni. Classified at $\sim 1.3$ days after explosion, the object showed narrow (FWHM $<250\,\textrm{km}\,\textrm{s}^{-1}$) recombination lines of ionized helium, nitrogen, and carbon, as typically seen in flash-spectroscopy events. Using the non-LTE radiative transfer code CMFGEN to model our first high resolution spectrum, we infer a pro…
▽ More
We present multiwavelength observations of the Type II SN 2020pni. Classified at $\sim 1.3$ days after explosion, the object showed narrow (FWHM $<250\,\textrm{km}\,\textrm{s}^{-1}$) recombination lines of ionized helium, nitrogen, and carbon, as typically seen in flash-spectroscopy events. Using the non-LTE radiative transfer code CMFGEN to model our first high resolution spectrum, we infer a progenitor mass-loss rate of $\dot{M}=(3.5-5.3)\times10^{-3}$ M$_{\odot}$ yr$^{-1}$ (assuming a wind velocity of $v_w=200\,\textrm{km}\,\textrm{s}^{-1}$), estimated at a radius of $R_{\rm in}=2.5\times10^{14}\,\rm{cm}$. In addition, we find that the progenitor of SN 2020pni was enriched in helium and nitrogen (relative abundances in mass fractions of 0.30$-$0.40, and $8.2\times10^{-3}$, respectively). Radio upper limits are also consistent with a dense CSM, and a mass-loss rate of $\dot M>5 \times 10^{-4}\,\rm{M_{\odot}\,yr^{-1}}$. During the first 4 days after first light, we also observe an increase in velocity of the hydrogen lines (from $\sim 250\,\textrm{km}\,\textrm{s}^{-1}$ to $\sim 1000\,\textrm{km}\,\textrm{s}^{-1}$), which suggests a complex CSM. The presence of dense and confined CSM, as well as its inhomogeneous structure, suggest a phase of enhanced mass loss of the progenitor of SN 2020pni during the last year before explosion. Finally, we compare SN 2020pni to a sample of other shock-photoionization events. We find no evidence of correlations among the physical parameters of the explosions and the characteristics of the CSM surrounding the progenitors of these events. This favors the idea that the mass-loss experienced by massive stars during their final years could be governed by stochastic phenomena, and that, at the same time, the physical mechanisms responsible for this mass-loss must be common to a variety of different progenitors.
△ Less
Submitted 1 March, 2022; v1 submitted 25 May, 2021;
originally announced May 2021.
-
The Young Supernova Experiment: Survey Goals, Overview, and Operations
Authors:
D. O. Jones,
R. J. Foley,
G. Narayan,
J. Hjorth,
M. E. Huber,
P. D. Aleo,
K. D. Alexander,
C. R. Angus,
K. Auchettl,
V. F. Baldassare,
S. H. Bruun,
K. C. Chambers,
D. Chatterjee,
D. L. Coppejans,
D. A. Coulter,
L. DeMarchi,
G. Dimitriadis,
M. R. Drout,
A. Engel,
K. D. French,
A. Gagliano,
C. Gall,
T. Hung,
L. Izzo,
W. V. Jacobson-Galán
, et al. (46 additional authors not shown)
Abstract:
Time domain science has undergone a revolution over the past decade, with tens of thousands of new supernovae (SNe) discovered each year. However, several observational domains, including SNe within days or hours of explosion and faint, red transients, are just beginning to be explored. Here, we present the Young Supernova Experiment (YSE), a novel optical time-domain survey on the Pan-STARRS tele…
▽ More
Time domain science has undergone a revolution over the past decade, with tens of thousands of new supernovae (SNe) discovered each year. However, several observational domains, including SNe within days or hours of explosion and faint, red transients, are just beginning to be explored. Here, we present the Young Supernova Experiment (YSE), a novel optical time-domain survey on the Pan-STARRS telescopes. Our survey is designed to obtain well-sampled $griz$ light curves for thousands of transient events up to $z \approx 0.2$. This large sample of transients with 4-band light curves will lay the foundation for the Vera C. Rubin Observatory and the Nancy Grace Roman Space Telescope, providing a critical training set in similar filters and a well-calibrated low-redshift anchor of cosmologically useful SNe Ia to benefit dark energy science. As the name suggests, YSE complements and extends other ongoing time-domain surveys by discovering fast-rising SNe within a few hours to days of explosion. YSE is the only current four-band time-domain survey and is able to discover transients as faint $\sim$21.5 mag in $gri$ and $\sim$20.5 mag in $z$, depths that allow us to probe the earliest epochs of stellar explosions. YSE is currently observing approximately 750 square degrees of sky every three days and we plan to increase the area to 1500 square degrees in the near future. When operating at full capacity, survey simulations show that YSE will find $\sim$5000 new SNe per year and at least two SNe within three days of explosion per month. To date, YSE has discovered or observed 8.3% of the transient candidates reported to the International Astronomical Union in 2020. We present an overview of YSE, including science goals, survey characteristics and a summary of our transient discoveries to date.
△ Less
Submitted 5 January, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Mixture Models for Photometric Redshifts
Authors:
Z. Ansari,
A. Agnello,
C. Gall
Abstract:
Determining photometric redshifts to high accuracy is paramount to measure distances in wide-field cosmological experiments. With only photometric information at hand, photo-zs are prone to systematic uncertainties in the intervening extinction and the unknown underlying spectral-energy distribution of different astrophysical sources. Here, we aim to resolve these model degeneracies and obtain a c…
▽ More
Determining photometric redshifts to high accuracy is paramount to measure distances in wide-field cosmological experiments. With only photometric information at hand, photo-zs are prone to systematic uncertainties in the intervening extinction and the unknown underlying spectral-energy distribution of different astrophysical sources. Here, we aim to resolve these model degeneracies and obtain a clear separation between intrinsic physical properties of astrophysical sources and extrinsic systematics. We aim at estimates of the full photo-z probability distributions, and their uncertainties. We perform a probabilistic photo-z determination using Mixture Density Networks (MDN). The training data-set is composed of optical ($griz$) point-spread-function and model magnitudes and extinction measurements from the SDSS-DR15, and WISE midinfrared ($3.4 μ$m and $4.6 μ$m) model magnitudes. We use Infinite Gaussian Mixture models to classify the objects in our data-set as stars, galaxies or quasars, and to determine the number of MDN components to achieve optimal performance. The fraction of objects that are correctly split into the main classes is 94%. Our method improves the bias of photometric redshift estimation (i.e. the mean $Δz$ = (zp - zs)/(1 + zs)) by one order of magnitude compared to the SDSS photo-z, and decreases the fraction of $3 σ$ outliers (i.e. 3rms$(Δz) < Δz$). The relative, root-mean-square systematic uncertainty in our resulting photo-zs is down to 1.7% for low-redshift galaxies (zs $<$ 0.5). We have demonstrated the feasibility of machine-learning based methods that produce full probability distributions for photo-z estimates with a performance that is competitive with state-of-the art techniques. Our method can be applied to wide-field surveys where extinction can vary significantly across the sky and with sparse spectroscopic calibration samples.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Inferring Political Preferences from Twitter
Authors:
Mohd Zeeshan Ansari,
Areesha Fatima Siddiqui,
Mohammad Anas
Abstract:
Sentiment analysis is the task of automatic analysis of opinions and emotions of users towards an entity or some aspect of that entity. Political Sentiment Analysis of social media helps the political strategists to scrutinize the performance of a party or candidate and improvise their weaknesses far before the actual elections. During the time of elections, the social networks get flooded with bl…
▽ More
Sentiment analysis is the task of automatic analysis of opinions and emotions of users towards an entity or some aspect of that entity. Political Sentiment Analysis of social media helps the political strategists to scrutinize the performance of a party or candidate and improvise their weaknesses far before the actual elections. During the time of elections, the social networks get flooded with blogs, chats, debates and discussions about the prospects of political parties and politicians. The amount of data generated is much large to study, analyze and draw inferences using the latest techniques. Twitter is one of the most popular social media platforms enables us to perform domain-specific data preparation. In this work, we chose to identify the inclination of political opinions present in Tweets by modelling it as a text classification problem using classical machine learning. The tweets related to the Delhi Elections in 2020 are extracted and employed for the task. Among the several algorithms, we observe that Support Vector Machines portrays the best performance.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
Feature Selection on Noisy Twitter Short Text Messages for Language Identification
Authors:
Mohd Zeeshan Ansari,
Tanvir Ahmad,
Ana Fatima
Abstract:
The task of written language identification involves typically the detection of the languages present in a sample of text. Moreover, a sequence of text may not belong to a single inherent language but also may be mixture of text written in multiple languages. This kind of text is generated in large volumes from social media platforms due to its flexible and user friendly environment. Such text con…
▽ More
The task of written language identification involves typically the detection of the languages present in a sample of text. Moreover, a sequence of text may not belong to a single inherent language but also may be mixture of text written in multiple languages. This kind of text is generated in large volumes from social media platforms due to its flexible and user friendly environment. Such text contains very large number of features which are essential for development of statistical, probabilistic as well as other kinds of language models. The large number of features have rich as well as irrelevant and redundant features which have diverse effect over the performance of the learning model. Therefore, feature selection methods are significant in choosing feature that are most relevant for an efficient model. In this article, we basically consider the Hindi-English language identification task as Hindi and English are often two most widely spoken languages of India. We apply different feature selection algorithms across various learning algorithms in order to analyze the effect of the algorithm as well as the number of features on the performance of the task. The methodology focuses on the word level language identification using a novel dataset of 6903 tweets extracted from Twitter. Various n-gram profiles are examined with different feature selection algorithms over many classifiers. Finally, an exhaustive comparative analysis is put forward with respect to the overall experiments conducted for the task.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
Dynamical mass inference of galaxy clusters with neural flows
Authors:
Doogesh Kodi Ramanah,
Radosław Wojtak,
Zoe Ansari,
Christa Gall,
Jens Hjorth
Abstract:
We present an algorithm for inferring the dynamical mass of galaxy clusters directly from their respective phase-space distributions, i.e. the observed line-of-sight velocities and projected distances of galaxies from the cluster centre. Our method employs normalizing flows, a deep neural network capable of learning arbitrary high-dimensional probability distributions, and inherently accounts, to…
▽ More
We present an algorithm for inferring the dynamical mass of galaxy clusters directly from their respective phase-space distributions, i.e. the observed line-of-sight velocities and projected distances of galaxies from the cluster centre. Our method employs normalizing flows, a deep neural network capable of learning arbitrary high-dimensional probability distributions, and inherently accounts, to an adequate extent, for the presence of interloper galaxies which are not bounded to a given cluster, the primary contaminant of dynamical mass measurements. We validate and showcase the performance of our neural flow approach to robustly infer the dynamical mass of clusters from a realistic mock cluster catalogue. A key aspect of our novel algorithm is that it yields the probability density function of the mass of a particular cluster, thereby providing a principled way of quantifying uncertainties, in contrast to conventional machine learning approaches. The neural network mass predictions, when applied to a contaminated catalogue with interlopers, have a mean overall logarithmic residual scatter of 0.028 dex, with a log-normal scatter of 0.126 dex, which goes down to 0.089 dex for clusters in the intermediate to high mass range. This is an improvement by nearly a factor of four relative to the classical cluster mass scaling relation with the velocity dispersion, and outperforms recently proposed machine learning approaches. We also apply our neural flow mass estimator to a compilation of galaxy observations of some well-studied clusters with robust dynamical mass estimates, further substantiating the efficacy of our algorithm.
△ Less
Submitted 8 September, 2020; v1 submitted 12 March, 2020;
originally announced March 2020.
-
Context based Analysis of Lexical Semantics for Hindi Language
Authors:
Mohd Zeeshan Ansari,
Lubna Khan
Abstract:
A word having multiple senses in a text introduces the lexical semantic task to find out which particular sense is appropriate for the given context. One such task is Word sense disambiguation which refers to the identification of the most appropriate meaning of the polysemous word in a given context using computational algorithms. The language processing research in Hindi, the official language o…
▽ More
A word having multiple senses in a text introduces the lexical semantic task to find out which particular sense is appropriate for the given context. One such task is Word sense disambiguation which refers to the identification of the most appropriate meaning of the polysemous word in a given context using computational algorithms. The language processing research in Hindi, the official language of India, and other Indian languages is restricted by unavailability of the standard corpus. For Hindi word sense disambiguation also, the large corpus is not available. In this work, we prepared the text containing new senses of certain words leading to the enrichment of the sense-tagged Hindi corpus of sixty polysemous words. Furthermore, we analyzed two novel lexical associations for Hindi word sense disambiguation based on the contextual features of the polysemous word. The evaluation of these methods is carried out over learning algorithms and favorable results are achieved.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Cross Script Hindi English NER Corpus from Wikipedia
Authors:
Mohd Zeeshan Ansari,
Tanvir Ahmad,
Md Arshad Ali
Abstract:
The text generated on social media platforms is essentially a mixed lingual text. The mixing of language in any form produces considerable amount of difficulty in language processing systems. Moreover, the advancements in language processing research depends upon the availability of standard corpora. The development of mixed lingual Indian Named Entity Recognition (NER) systems are facing obstacle…
▽ More
The text generated on social media platforms is essentially a mixed lingual text. The mixing of language in any form produces considerable amount of difficulty in language processing systems. Moreover, the advancements in language processing research depends upon the availability of standard corpora. The development of mixed lingual Indian Named Entity Recognition (NER) systems are facing obstacles due to unavailability of the standard evaluation corpora. Such corpora may be of mixed lingual nature in which text is written using multiple languages predominantly using a single script only. The motivation of our work is to emphasize the automatic generation such kind of corpora in order to encourage mixed lingual Indian NER. The paper presents the preparation of a Cross Script Hindi-English Corpora from Wikipedia category pages. The corpora is successfully annotated using standard CoNLL-2003 categories of PER, LOC, ORG, and MISC. Its evaluation is carried out on a variety of machine learning algorithms and favorable results are achieved.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Red Giant evolution in Modified Gravity
Authors:
Sh. Najafi,
M. T. Mirtorabi,
Z. Ansari,
D. F. Mota
Abstract:
In this paper, we study the chameleon profile in inhomogeneous density distributions and find that the fifth force in thin shell near the surface is weaker from what expected in homogeneous density distributions. Also, we check the validity of quasi-static approximation for the chameleon scalar field in the astrophysical time scales. We have investigated the rolling down behaviour of the scalar fi…
▽ More
In this paper, we study the chameleon profile in inhomogeneous density distributions and find that the fifth force in thin shell near the surface is weaker from what expected in homogeneous density distributions. Also, we check the validity of quasi-static approximation for the chameleon scalar field in the astrophysical time scales. We have investigated the rolling down behaviour of the scalar field on its effective potential inside a one solar mass red giant star by using MESA code. We have found that the scalar field is fast enough to follow the minimum of the potential. This adiabatic behaviour reduces the fifth force and extends the screened regions to lower densities where the field has smaller mass and was expected to be unscreened. As a consequence, the star evolution is similar to what expected from standard general relativity. In addition, considering the stability of star, an approximate constraint on the coupling constant $β$ is found.
△ Less
Submitted 4 February, 2019; v1 submitted 12 February, 2018;
originally announced February 2018.
-
The Relativistic Point Charge Revisited : Novel Features
Authors:
Anarya Ray,
Parthasarathi Majumdar,
Zahid Ansari
Abstract:
A fully relativistically covariant formulation of the classical Maxwell electrodynamics of an arbitrarily-moving point charge is presented, purely in terms of gauge invariant potentials without entailing any gauge fixing. A new, relativistically covariant energy-momentum tensor for the radiation fields is derived and yields results for the angular power distribution, in full agreement with results…
▽ More
A fully relativistically covariant formulation of the classical Maxwell electrodynamics of an arbitrarily-moving point charge is presented, purely in terms of gauge invariant potentials without entailing any gauge fixing. A new, relativistically covariant energy-momentum tensor for the radiation fields is derived and yields results for the angular power distribution, in full agreement with results derived in a frame-dependent manner in standard texts of classical electrodynamics. This is then used to present a full derivation, not available in standard texts, of the energy-momentum and orbital angular momentum of a relativistic point charge. Radiation backreaction is turned on and the system reanalyzed Lorentz-covariantly, including effects of mass renormalization. This leads us to reiterate earlier conclusions regarding the inherent inadequacy of classical Maxwell electrodynamics. The Abraham-Lorentz equation is derived en passant in appropriate limits without requiring any extraneous structural artifact, and the Landau-Lifschitz proposal for modification of the theory is also critically reviewed.
△ Less
Submitted 20 May, 2019; v1 submitted 23 January, 2018;
originally announced January 2018.
-
New Fuzzy LBP Features for Face Recognition
Authors:
Abdullah Gubbi,
Mohammed Fazle Azeem,
Zahid Ansari
Abstract:
There are many Local texture features each very in way they implement and each of the Algorithm trying improve the performance. An attempt is made in this paper to represent a theoretically very simple and computationally effective approach for face recognition. In our implementation the face image is divided into 3x3 sub-regions from which the features are extracted using the Local Binary Pattern…
▽ More
There are many Local texture features each very in way they implement and each of the Algorithm trying improve the performance. An attempt is made in this paper to represent a theoretically very simple and computationally effective approach for face recognition. In our implementation the face image is divided into 3x3 sub-regions from which the features are extracted using the Local Binary Pattern (LBP) over a window, fuzzy membership function and at the central pixel. The LBP features possess the texture discriminative property and their computational cost is very low. By utilising the information from LBP, membership function, and central pixel, the limitations of traditional LBP is eliminated. The bench mark database like ORL and Sheffield Databases are used for the evaluation of proposed features with SVM classifier. For the proposed approach K-fold and ROC curves are obtained and results are compared.
△ Less
Submitted 23 September, 2015;
originally announced September 2015.
-
A Fuzzy Clustering Based Approach for Mining Usage Profiles from Web Log Data
Authors:
Zahid Ansari,
Mohammad Fazle Azeem,
A. Vinaya Babu,
Waseem Ahmed
Abstract:
The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users…
▽ More
The World Wide Web continues to grow at an amazing rate in both the size and complexity of Web sites and is well on its way to being the main reservoir of information and data. Due to this increase in growth and complexity of WWW, web site publishers are facing increasing difficulty in attracting and retaining users. To design popular and attractive websites publishers must understand their users needs. Therefore analyzing users behaviour is an important part of web page design. Web Usage Mining (WUM) is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. WUM contains three main steps: preprocessing, knowledge extraction and results analysis. The goal of the preprocessing stage in Web usage mining is to transform the raw web log data into a set of user profiles. Each such profile captures a sequence or a set of URLs representing a user session.
△ Less
Submitted 1 September, 2015;
originally announced September 2015.
-
Discovery of Web Usage Profiles Using Various Clustering Techniques
Authors:
Zahid Ansari,
Waseem Ahmed,
M. F. Azeem,
A. Vinaya Babu
Abstract:
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely us…
▽ More
The explosive growth of World Wide Web (WWW) has necessitated the development of Web personalization systems in order to understand the user preferences to dynamically serve customized content to individual users. To reveal information about user preferences from Web usage data, Web Usage Mining (WUM) techniques are extensively being applied to the Web log data. Clustering techniques are widely used in WUM to capture similar interests and trends among users accessing a Web site. Clustering aims to divide a data set into groups or clusters where inter-cluster similarities are minimized while the intra cluster similarities are maximized. This paper reviews four of the popularly used clustering techniques: k-Means, k-Medoids, Leader and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Performance and validity results of each technique are presented and compared.
△ Less
Submitted 1 September, 2015;
originally announced September 2015.
-
A Fuzzy Approach for Feature Evaluation and Dimensionality Reduction to Improve the Quality of Web Usage Mining Results
Authors:
Zahid Ansari,
M. F. Azeem,
A. Vinaya Babu,
Waseem Ahmed
Abstract:
Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionize…
▽ More
Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the users navigational behavior. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionized data in order to capture similar interests and trends among users navigational patterns. Since the sessionized data may contain thousands of user sessions and each user session may consist of hundreds of URL accesses, dimensionality reduction is achieved by eliminating the low support URLs. Very small sessions are also removed in order to filter out the noise from the data. But direct elimination of low support URLs and small sized sessions may results in loss of a significant amount of information especially when the count of low support URLs and small sessions is large. We propose a fuzzy solution to deal with this problem by assigning weights to URLs and user sessions based on a fuzzy membership function. After assigning the weights we apply a Fuzzy c-Mean Clustering algorithm to discover the clusters of user profiles. In this paper, we describe our fuzzy set theoretic approach to perform feature selection (or dimensionality reduction) and session weight assignment. Finally we compare our soft computing based approach of dimensionality reduction with the traditional approach of direct elimination of small sessions and low support count URLs. Our results show that fuzzy feature evaluation and dimensionality reduction results in better performance and validity indices for the discovered clusters.
△ Less
Submitted 1 September, 2015;
originally announced September 2015.
-
Quantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions
Authors:
Zahid Ansari,
M. F. Azeem,
Waseem Ahmed,
A. Vinaya Babu
Abstract:
Clustering techniques are widely used in Web Usage Mining to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are mini…
▽ More
Clustering techniques are widely used in Web Usage Mining to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are minimized while the intra cluster similarities are maximized. Since the application of different clustering algorithms generally results in different sets of cluster formation, it is important to evaluate the performance of these methods in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. This paper describes various validity and accuracy measures including Dunn's Index, Davies Bouldin Index, C Index, Rand Index, Jaccard Index, Silhouette Index, Fowlkes Mallows and Sum of the Squared Error (SSE). We conducted the performance evaluation of the following clustering techniques: k-Means, k-Medoids, Leader, Single Link Agglomerative Hierarchical and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Finally their performance results are presented and compared.
△ Less
Submitted 13 July, 2015;
originally announced July 2015.