-
On $r$-primitive $k$-normal polynomials with two prescribed coefficients
Authors:
Avnish K. Sharma,
Mamta Rani,
Sharwan K. Tiwari,
Anupama Panigrahi
Abstract:
This article investigates the existence of an $r$-primitive $k$-normal polynomial, defined as the minimal polynomial of an $r$-primitive $k$-normal element in $\mathbb{F}_{q^n}$, with a specified degree $n$ and two given coefficients over the finite field $\mathbb{F}_{q}$. Here, $q$ represents an odd prime power, and $n$ is an integer. The article establishes a sufficient condition to ensure the e…
▽ More
This article investigates the existence of an $r$-primitive $k$-normal polynomial, defined as the minimal polynomial of an $r$-primitive $k$-normal element in $\mathbb{F}_{q^n}$, with a specified degree $n$ and two given coefficients over the finite field $\mathbb{F}_{q}$. Here, $q$ represents an odd prime power, and $n$ is an integer. The article establishes a sufficient condition to ensure the existence of such a polynomial. Using this condition, it is demonstrated that a $2$-primitive $2$-normal polynomial of degree $n$ always exists over $\mathbb{F}_{q}$ when both $q\geq 11$ and $n\geq 15$. However, for the range $10\leq n\leq 14$, uncertainty remains regarding the existence of such a polynomial for $71$ specific pairs of $(q,n)$. Moreover, when $q<11$, the number of uncertain pairs reduces to $16$. Furthermore, for the case of $n=9$, extensive computational power is employed using SageMath software, and it is found that the count of such uncertain pairs is reduced to $3988$.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Signatures of electronic ordering in transport in graphene flat bands
Authors:
Archisman Panigrahi,
Leonid Levitov
Abstract:
Recently, a wide family of electronic orders was unveiled in graphene flat bands, such as spin- and valley-polarized phases as well as nematic momentum-polarized phases, stabilized by exchange interactions via a generalized Stoner mechanism. Momentum polarization involves orbital degrees of freedom and is therefore expected to impact resistivity in a way which is uniquely sensitive to the ordering…
▽ More
Recently, a wide family of electronic orders was unveiled in graphene flat bands, such as spin- and valley-polarized phases as well as nematic momentum-polarized phases, stabilized by exchange interactions via a generalized Stoner mechanism. Momentum polarization involves orbital degrees of freedom and is therefore expected to impact resistivity in a way which is uniquely sensitive to the ordering type. Under pocket polarization, carrier distribution shifts in k space and samples the band mass in regions defined by the displaced momentum distribution. This makes transport coefficients sensitive to pocket polarization, resulting in the ohmic resistivity decreasing with temperature. In addition, it leads to current switching and hysteresis under strong E field. This behavior remains robust in the presence of electron-phonon scattering and is therefore expected to be generic.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Spin-orbit proximity in MoS$_2$/bilayer graphene heterostructures
Authors:
M. Masseroni,
M. Gull,
A. Panigrahi,
N. Jacobsen,
F. Fischer,
C. Tong,
J. D. Gerber,
M. Niese,
T. Taniguchi,
K. Watanabe,
L. Levitov,
T. Ihn,
K. Ensslin,
H. Duprez
Abstract:
Van der Waals heterostructures provide a versatile platform for tailoring electronic properties through the integration of two-dimensional materials. Among these combinations, the interaction between bilayer graphene and transition metal dichalcogenides (TMDs) stands out due to its potential for inducing spin-orbit coupling (SOC) in graphene. Future devices concepts require the understanding the p…
▽ More
Van der Waals heterostructures provide a versatile platform for tailoring electronic properties through the integration of two-dimensional materials. Among these combinations, the interaction between bilayer graphene and transition metal dichalcogenides (TMDs) stands out due to its potential for inducing spin-orbit coupling (SOC) in graphene. Future devices concepts require the understanding the precise nature of SOC in TMD/bilayer graphene heterostructures and its influence on electronic transport phenomena. Here, we experimentally confirm the presence of two distinct types of SOC, Ising (1.55 meV) and Rashba (2.5 meV), in bilayer graphene when interfaced with molybdenum disulphide, recognized as one of the most stable TMDs. Furthermore, we reveal a non-monotonic trend in conductivity with respect to the electric displacement field at charge neutrality. This phenomenon is ascribed to the existence of single-particle gaps induced by the Ising SOC, which can be closed by a critical displacement field. Remarkably, our findings also unveil sharp peaks in the magnetoconductivity around the critical displacement field, challenging existing theoretical models.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Efficient Stagewise Pretraining via Progressive Subnetworks
Authors:
Abhishek Panigrahi,
Nikunj Saunshi,
Kaifeng Lyu,
Sobhan Miryoosefi,
Sashank Reddi,
Satyen Kale,
Sanjiv Kumar
Abstract:
Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. gradual stacking (Reddi et al., 2023)). While the resource and wall-time savings are appealing, it has limitations, particularly the inability to evalu…
▽ More
Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. gradual stacking (Reddi et al., 2023)). While the resource and wall-time savings are appealing, it has limitations, particularly the inability to evaluate the full model during earlier stages, and degradation in model quality due to smaller model capacity in the initial stages. In this work, we propose an alternative framework, progressive subnetwork training, that maintains the full model throughout training, but only trains subnetworks within the model in each step. We focus on a simple instantiation of this framework, Random Path Training (RaPTr) that only trains a sub-path of layers in each step, progressively increasing the path lengths in stages. RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods. Furthermore, RaPTr shows better downstream performance on UL2, improving QA tasks and SuperGLUE by 1-5% compared to standard training and stacking. Finally, we provide a theoretical basis for RaPTr to justify (a) the increasing complexity of subnetworks in stages, and (b) the stability in loss across stage transitions due to residual connections and layer norm.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Hate Speech and Offensive Content Detection in Indo-Aryan Languages: A Battle of LSTM and Transformers
Authors:
Nikhil Narayan,
Mrutyunjay Biswal,
Pramod Goyal,
Abhranta Panigrahi
Abstract:
Social media platforms serve as accessible outlets for individuals to express their thoughts and experiences, resulting in an influx of user-generated data spanning all age groups. While these platforms enable free expression, they also present significant challenges, including the proliferation of hate speech and offensive content. Such objectionable language disrupts objective discourse and can…
▽ More
Social media platforms serve as accessible outlets for individuals to express their thoughts and experiences, resulting in an influx of user-generated data spanning all age groups. While these platforms enable free expression, they also present significant challenges, including the proliferation of hate speech and offensive content. Such objectionable language disrupts objective discourse and can lead to radicalization of debates, ultimately threatening democratic values. Consequently, organizations have taken steps to monitor and curb abusive behavior, necessitating automated methods for identifying suspicious posts. This paper contributes to Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC) 2023 shared tasks track. We, team Z-AGI Labs, conduct a comprehensive comparative analysis of hate speech classification across five distinct languages: Bengali, Assamese, Bodo, Sinhala, and Gujarati. Our study encompasses a wide range of pre-trained models, including Bert variants, XLM-R, and LSTM models, to assess their performance in identifying hate speech across these languages. Results reveal intriguing variations in model performance. Notably, Bert Base Multilingual Cased emerges as a strong performer across languages, achieving an F1 score of 0.67027 for Bengali and 0.70525 for Assamese. At the same time, it significantly outperforms other models with an impressive F1 score of 0.83009 for Bodo. In Sinhala, XLM-R stands out with an F1 score of 0.83493, whereas for Gujarati, a custom LSTM-based model outshined with an F1 score of 0.76601. This study offers valuable insights into the suitability of various pre-trained models for hate speech detection in multilingual settings. By considering the nuances of each, our research contributes to an informed model selection for building robust hate speech detection systems.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Trainable Transformer in Transformer
Authors:
Abhishek Panigrahi,
Sadhika Malladi,
Mengzhou Xia,
Sanjeev Arora
Abstract:
Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions require large memory overhead, which makes simulation of more sophisticated internal models intractable. In this work, we propose an efficient construction, Tran…
▽ More
Recent works attribute the capability of in-context learning (ICL) in large pre-trained language models to implicitly simulating and fine-tuning an internal model (e.g., linear or 2-layer MLP) during inference. However, such constructions require large memory overhead, which makes simulation of more sophisticated internal models intractable. In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e.g., pre-trained language models). In particular, we introduce innovative approximation techniques that allow a TinT model with less than 2 billion parameters to simulate and fine-tune a 125 million parameter transformer model within a single forward pass. TinT accommodates many common transformer variants and its design ideas also improve the efficiency of past instantiations of simple models inside transformers. We conduct end-to-end experiments to validate the internal fine-tuning procedure of TinT on various language modeling and downstream tasks. For example, even with a limited one-step budget, we observe TinT for a OPT-125M model improves performance by 4-16% absolute on average compared to OPT-125M. These findings suggest that large pre-trained language models are capable of performing intricate subroutines. To facilitate further work, a modular and extensible codebase for TinT is included.
△ Less
Submitted 8 February, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Do Transformers Parse while Predicting the Masked Word?
Authors:
Haoyu Zhao,
Abhishek Panigrahi,
Rong Ge,
Sanjeev Arora
Abstract:
Pre-trained language models have been shown to encode linguistic structures, e.g. dependency and constituency parse trees, in their embeddings while being trained on unsupervised loss functions like masked language modeling. Some doubts have been raised whether the models actually are doing parsing or only some computation weakly correlated with it. We study questions: (a) Is it possible to explic…
▽ More
Pre-trained language models have been shown to encode linguistic structures, e.g. dependency and constituency parse trees, in their embeddings while being trained on unsupervised loss functions like masked language modeling. Some doubts have been raised whether the models actually are doing parsing or only some computation weakly correlated with it. We study questions: (a) Is it possible to explicitly describe transformers with realistic embedding dimension, number of heads, etc. that are capable of doing parsing -- or even approximate parsing? (b) Why do pre-trained models capture parsing structure? This paper takes a step toward answering these questions in the context of generative modeling with PCFGs. We show that masked language models like BERT or RoBERTa of moderate sizes can approximately execute the Inside-Outside algorithm for the English PCFG [Marcus et al, 1993]. We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data. We also give a construction of transformers with $50$ layers, $15$ attention heads, and $1275$ dimensional embeddings in average such that using its embeddings it is possible to do constituency parsing with $>70\%$ F1 score on PTB dataset. We conduct probing experiments on models pre-trained on PCFG-generated data to show that this not only allows recovery of approximate parse tree, but also recovers marginal span probabilities computed by the Inside-Outside algorithm, which suggests an implicit bias of masked language modeling towards this algorithm.
△ Less
Submitted 15 October, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Analytic calculation of the vison gap in the Kitaev spin liquid
Authors:
Aaditya Panigrahi,
Piers Coleman,
Alexei Tsvelik
Abstract:
Although the ground-state energy of the Kitaev spin liquid can be calculated exactly, the associated vison gap energy has to date only been calculated numerically from finite size diagonalization. Here we show that the phase shift for scattering Majorana fermions off a single bond-flip can be calculated analytically, leading to a closed-form expression for the vison gap energy $Δ= 0.2633J$. Genera…
▽ More
Although the ground-state energy of the Kitaev spin liquid can be calculated exactly, the associated vison gap energy has to date only been calculated numerically from finite size diagonalization. Here we show that the phase shift for scattering Majorana fermions off a single bond-flip can be calculated analytically, leading to a closed-form expression for the vison gap energy $Δ= 0.2633J$. Generalizations of our approach can be applied to Kitaev spin liquids on more complex lattices such as the three dimensional hyper-octagonal lattice.
△ Less
Submitted 28 February, 2023;
originally announced March 2023.
-
Task-Specific Skill Localization in Fine-tuned Language Models
Authors:
Abhishek Panigrahi,
Nikunj Saunshi,
Haoyu Zhao,
Sanjeev Arora
Abstract:
Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-shot settings. Thus fine-tuning allows the model to quickly pick up task-specific ``skills,'' but there has been limited study of where these newly-learnt skills reside inside the massive model. This paper introduces the term skill localization for this problem and proposes a solution. Given the downstream t…
▽ More
Pre-trained language models can be fine-tuned to solve diverse NLP tasks, including in few-shot settings. Thus fine-tuning allows the model to quickly pick up task-specific ``skills,'' but there has been limited study of where these newly-learnt skills reside inside the massive model. This paper introduces the term skill localization for this problem and proposes a solution. Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0.01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model. While reminiscent of recent works on parameter-efficient fine-tuning, the novel aspects here are that: (i) No further re-training is needed on the subset (unlike, say, with lottery tickets). (ii) Notable improvements are seen over vanilla fine-tuning with respect to calibration of predictions in-distribution ($40$-$90$% error reduction) as well as the quality of predictions out-of-distribution (OOD). In models trained on multiple tasks, a stronger notion of skill localization is observed, where the sparse regions corresponding to different tasks are almost disjoint, and their overlap (when it happens) is a proxy for task similarity. Experiments suggest that localization via grafting can assist certain forms of continual learning.
△ Less
Submitted 1 July, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Smart Contract Assisted Blockchain based PKI System
Authors:
Amrutanshu Panigrahi,
Ajit Kumar Nayak,
Rourab Paul
Abstract:
The proposed smart contract can prevent seven cyber attacks, such as Denial of Service (DoS), Man in the Middle Attack (MITM), Distributed Denial of Service (DDoS), 51\%, Injection attacks, Routing Attack, and Eclipse attack. The Delegated Proof of Stake (DPoS) consensus algorithm used in this model reduces the number of validators for each transaction which makes it suitable for lightweight appli…
▽ More
The proposed smart contract can prevent seven cyber attacks, such as Denial of Service (DoS), Man in the Middle Attack (MITM), Distributed Denial of Service (DDoS), 51\%, Injection attacks, Routing Attack, and Eclipse attack. The Delegated Proof of Stake (DPoS) consensus algorithm used in this model reduces the number of validators for each transaction which makes it suitable for lightweight applications. The timing complexity of key/certificate validation and signature/certificate revocation processes do not depend on the number of transactions. The comparisons of various timing parameters with existing solutions show that the proposed PKI is competitively better.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Authors:
Sadhika Malladi,
Kaifeng Lyu,
Abhishek Panigrahi,
Sanjeev Arora
Abstract:
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for thes…
▽ More
Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD. Analogous study of adaptive gradient methods, such as RMSprop and Adam, has been challenging because there were no rigorously proven SDE approximations for these methods. This paper derives the SDE approximations for RMSprop and Adam, giving theoretical guarantees of their correctness as well as experimental validation of their applicability to common large-scaling vision and language settings. A key practical result is the derivation of a $\textit{square root scaling rule}$ to adjust the optimization hyperparameters of RMSprop and Adam when changing batch size, and its empirical validation in deep learning settings.
△ Less
Submitted 13 February, 2023; v1 submitted 20 May, 2022;
originally announced May 2022.
-
Understanding Gradient Descent on Edge of Stability in Deep Learning
Authors:
Sanjeev Arora,
Zhiyuan Li,
Abhishek Panigrahi
Abstract:
Deep learning experiments by Cohen et al. [2021] using deterministic Gradient Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and sharpness (i.e., the largest eigenvalue of Hessian) no longer behave as in traditional optimization. Sharpness stabilizes around $2/$LR and loss goes up and down across iterations, yet still with an overall downward trend. The current pape…
▽ More
Deep learning experiments by Cohen et al. [2021] using deterministic Gradient Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and sharpness (i.e., the largest eigenvalue of Hessian) no longer behave as in traditional optimization. Sharpness stabilizes around $2/$LR and loss goes up and down across iterations, yet still with an overall downward trend. The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss. This is in contrast to many previous results about implicit bias either relying on infinitesimal updates or noise in gradient. Formally, for any smooth function $L$ with certain regularity condition, this effect is demonstrated for (1) Normalized GD, i.e., GD with a varying LR $η_t =\fracη{\| \nabla L(x(t)) \|}$ and loss $L$; (2) GD with constant LR and loss $\sqrt{L- \min_x L(x)}$. Both provably enter the Edge of Stability, with the associated flow on the manifold minimizing $λ_{1}(\nabla^2 L)$. The above theoretical results have been corroborated by an experimental study.
△ Less
Submitted 28 October, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
A solvable 3D Kondo lattice exhibiting odd-frequency pairing and order fractionalization
Authors:
Piers Coleman,
Aaditya Panigrahi,
Alexei Tsvelik
Abstract:
The Kondo lattice model plays a key role in our understanding of quantum materials, but a lack of small parameters has posed a long-standing problem. We present a 3 dimensional S= 1/2 Kondo lattice model describing a spin liquid within an electron sea. Strong correlations in the spin liquid are treated exactly, enabling a controlled analytical approach. Like a Peierls or BCS phase, a logarithmical…
▽ More
The Kondo lattice model plays a key role in our understanding of quantum materials, but a lack of small parameters has posed a long-standing problem. We present a 3 dimensional S= 1/2 Kondo lattice model describing a spin liquid within an electron sea. Strong correlations in the spin liquid are treated exactly, enabling a controlled analytical approach. Like a Peierls or BCS phase, a logarithmically divergent susceptibility leads to an instability into a new phase at arbitrarily small Kondo coupling. Our solution captures a plethora of emergent phenomena, including odd-frequency pairing, pair density wave formation and order fractionalization. The ground-state state is a pair density wave with a fractionalized charge e, S = 1/2 order parameter, formed between electrons and Majorana fermions.
△ Less
Submitted 20 July, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Inverses of $r$-primitive $k$-normal elements over finite fields
Authors:
Mamta Rani,
Avnish K. Sharma,
Sharwan K. Tiwari,
Anupama Panigrahi
Abstract:
Let $r$, $n$ be positive integers, $k$ be a non-negative integer and $q$ be any prime power such that $r\mid q^n-1.$ An element $α$ of the finite field $\mathbb{F}_{q^n}$ is called an {\it $r$-primitive} element, if its multiplicative order is $(q^n-1)/r$, and it is called a {\it $k$-normal} element over $\mathbb{F}_q$, if the greatest common divisor of the polynomials…
▽ More
Let $r$, $n$ be positive integers, $k$ be a non-negative integer and $q$ be any prime power such that $r\mid q^n-1.$ An element $α$ of the finite field $\mathbb{F}_{q^n}$ is called an {\it $r$-primitive} element, if its multiplicative order is $(q^n-1)/r$, and it is called a {\it $k$-normal} element over $\mathbb{F}_q$, if the greatest common divisor of the polynomials $m_α(x)=\sum_{i=1}^{n} α^{q^{i-1}}x^{n-i}$ and $x^n-1$ is of degree $k.$ In this article, we define the characteristic function for the set of $k$-normal elements, and with the help of this, we establish a sufficient condition for the existence of an element $α$ in $\mathbb{F}_{q^n}$, such that $α$ and $α^{-1}$ both are simultaneously $r$-primitive and $k$-normal over $\mathbb{F}_q$. Moreover, for $n>6k$, we show that there always exists an $r$-primitive and $k$-normal element $α$ such that $α^{-1}$ is also $r$-primitive and $k$-normal in all but finitely many fields $\mathbb{F}_{q^n}$ over $\mathbb{F}_q$, where $q$ and $n$ are such that $r\mid q^n-1$ and there exists a $k$-degree polynomial $g(x)\mid x^n-1$ over $\mathbb{F}_q$. In particular, we discuss the existence of an element $α$ in $\mathbb{F}_{q^n}$ such that $α$ and $α^{-1}$ both are simultaneously $1$-primitive and $1$-normal over $\mathbb{F}_q$.
△ Less
Submitted 27 January, 2022;
originally announced January 2022.
-
Projected Topological Branes
Authors:
Archisman Panigrahi,
Vladimir Juricic,
Bitan Roy
Abstract:
Nature harbors crystals of dimensionality ($d$) only up to three. Here we introduce the notion of \emph{projected topological branes} (PTBs): Lower-dimensional branes embedded in higher-dimensional parent topological crystals, constructed via a geometric cut-and-project procedure on the Hilbert space of the parent lattice Hamiltonian. When such a brane is inclined at a rational or an irrational sl…
▽ More
Nature harbors crystals of dimensionality ($d$) only up to three. Here we introduce the notion of \emph{projected topological branes} (PTBs): Lower-dimensional branes embedded in higher-dimensional parent topological crystals, constructed via a geometric cut-and-project procedure on the Hilbert space of the parent lattice Hamiltonian. When such a brane is inclined at a rational or an irrational slope, either a new lattice periodicity or a quasicrystal emerges. The latter gives birth to topoquasicrystals within the landscape of PTBs. As such PTBs are shown to inherit the hallmarks, such as the bulk-boundary, bulk-dislocation correspondences and topological invariant, of the parent topological crystals. We exemplify these outcomes by focusing on two-dimensional parent Chern insulators, leaving its signatures on projected one-dimensional (1D) topological branes in terms of localized endpoint, dislocation modes and the local Chern number. Finally, by stacking 1D projected Chern insulators, we showcase the imprints of three-dimensional Weyl semimetals in $d=2$, namely the Fermi arc surface states and bulk chiral zeroth Landau level, responsible for the chiral anomaly. Altogether, the proposed PTBs open a realistic avenue to harness higher-dimensional ($d>3$) topological phases in laboratory.
△ Less
Submitted 16 September, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Energy magnetization and transport in systems with a non-zero Berry curvature in a magnetic field
Authors:
Archisman Panigrahi,
Subroto Mukerjee
Abstract:
We demonstrate that the well-known expression for the charge magnetization of a sample with a non-zero Berry curvature can be obtained by demanding that the Einstein relation holds for the electric transport current. We extend this formalism to the transport energy current and show that the energy magnetization must satisfy a particular condition. We provide a physical interpretation of this condi…
▽ More
We demonstrate that the well-known expression for the charge magnetization of a sample with a non-zero Berry curvature can be obtained by demanding that the Einstein relation holds for the electric transport current. We extend this formalism to the transport energy current and show that the energy magnetization must satisfy a particular condition. We provide a physical interpretation of this condition, and relate the energy magnetization to circulating energy currents in Chern insulators due to chiral edge states. We further recover the expression for the energy magnetization with this alternative formalism. We also solve the Boltzmann Transport Equation for the non-equilibrium distribution function in 2D for systems with a non-zero Berry curvature in a magnetic field. This distribution function can be used to obtain the regular Hall response in time-reversal invariant samples with a non-zero Berry curvature, for which there is no anomalous Hall response.
△ Less
Submitted 31 July, 2023; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Learning and Generalization in RNNs
Authors:
Abhishek Panigrahi,
Navin Goyal
Abstract:
Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make…
▽ More
Simple recurrent neural networks (RNNs) and their more advanced cousins LSTMs etc. have been very successful in sequence modeling. Their theoretical understanding, however, is lacking and has not kept pace with the progress for feedforward networks, where a reasonably complete understanding in the special case of highly overparametrized one-hidden-layer networks has emerged. In this paper, we make progress towards remedying this situation by proving that RNNs can learn functions of sequences. In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions. Conceptually and technically, we introduce new ideas which enable us to extract information from the hidden state of the RNN in our proofs -- addressing a crucial weakness in previous work. We illustrate our results on some regular language recognition problems.
△ Less
Submitted 31 May, 2021;
originally announced June 2021.
-
Non-Hermitian dislocation modes: Stability and melting across exceptional points
Authors:
Archisman Panigrahi,
Roderich Moessner,
Bitan Roy
Abstract:
The traditional bulk-boundary correspondence assuring robust gapless modes at the edges and surfaces of insulating and nodal topological materials gets masked in non-Hermitian (NH) systems by the skin effect, manifesting an accumulation of a macroscopic number of states near such interfaces. Here we show that dislocation lattice defects are immune to such skin effect or at most display a \emph{wea…
▽ More
The traditional bulk-boundary correspondence assuring robust gapless modes at the edges and surfaces of insulating and nodal topological materials gets masked in non-Hermitian (NH) systems by the skin effect, manifesting an accumulation of a macroscopic number of states near such interfaces. Here we show that dislocation lattice defects are immune to such skin effect or at most display a \emph{weak} skin effect (depending on its relative orientation with the Burgers vector), and as such they support robust topological modes in the bulk of a NH system, specifically when the parent Hermitian phase features band inversion at a finite momentum. However, the dislocation modes gradually lose their support at their core when the system approaches an exceptional point, and finally melt into the boundary of the system across the NH band gap closing. We explicitly demonstrate these findings for a two-dimensional NH Chern insulator, thereby establishing that dislocation lattice defects can be instrumental to experimentally probe pristine NH topology.
△ Less
Submitted 12 July, 2022; v1 submitted 11 May, 2021;
originally announced May 2021.
-
Non-Gaussianity of Stochastic Gradient Noise
Authors:
Abhishek Panigrahi,
Raghav Somani,
Navin Goyal,
Praneeth Netrapalli
Abstract:
What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training? This question has attracted much attention. In this paper, we study the distribution of the Stochastic Gradient Noise (SGN) vectors during the training. We observe that for batch sizes 256 and above, the distribution is best described as Gaussian at-least in the ea…
▽ More
What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training? This question has attracted much attention. In this paper, we study the distribution of the Stochastic Gradient Noise (SGN) vectors during the training. We observe that for batch sizes 256 and above, the distribution is best described as Gaussian at-least in the early phases of training. This holds across data-sets, architectures, and other choices.
△ Less
Submitted 25 October, 2019; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Effect of Activation Functions on the Training of Overparametrized Neural Nets
Authors:
Abhishek Panigrahi,
Abhishek Shetty,
Navin Goyal
Abstract:
It is well-known that overparametrized neural networks trained using gradient-based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. These results either assume that the activation function is ReLU or they crucially depend on the minimum ei…
▽ More
It is well-known that overparametrized neural networks trained using gradient-based methods quickly achieve small training error with appropriate hyperparameter settings. Recent papers have proved this statement theoretically for highly overparametrized networks under reasonable assumptions. These results either assume that the activation function is ReLU or they crucially depend on the minimum eigenvalue of a certain Gram matrix depending on the data, random initialization and the activation function. In the later case, existing works only prove that this minimum eigenvalue is non-zero and do not provide quantitative bounds. On the empirical side, a contemporary line of investigations has proposed a number of alternative activation functions which tend to perform better than ReLU at least in some settings but no clear understanding has emerged. This state of affairs underscores the importance of theoretically understanding the impact of activation functions on training. In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks. A crucial property that governs the performance of an activation is whether or not it is smooth. For non-smooth activations such as ReLU, SELU and ELU, all eigenvalues of the associated Gram matrix are large under minimal assumptions on the data. For smooth activations such as tanh, swish and polynomials, the situation is more complex. If the subspace spanned by the data has small dimension then the minimum eigenvalue of the Gram matrix can be small leading to slow training. But if the dimension is large and the data satisfies another mild condition, then the eigenvalues are large. If we allow deep networks, then the small data dimension is not a limitation provided that the depth is sufficient. We discuss a number of extensions and applications of these results.
△ Less
Submitted 10 April, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow
Authors:
Suman Kalyan Maity,
Abhishek Panigrahi,
Sayan Ghosh,
Arundhati Banerjee,
Pawan Goyal,
Animesh Mukherjee
Abstract:
In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow. The proposed system learns the content representation from question title and body. Subsequently, the learnt representation from heterogeneous relationship between user and tags is fused with the content representation for the final tag prediction. On a ve…
▽ More
In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow. The proposed system learns the content representation from question title and body. Subsequently, the learnt representation from heterogeneous relationship between user and tags is fused with the content representation for the final tag prediction. On a very large-scale dataset comprising half a million question posts, DeepTagRec beats all the baselines; in particular, it significantly outperforms the best performing baseline T agCombine achieving an overall gain of 60.8% and 36.8% in precision@3 and recall@10 respectively. DeepTagRec also achieves 63% and 33.14% maximum improvement in exact-k accuracy and top-k accuracy respectively over TagCombine
△ Less
Submitted 10 March, 2019;
originally announced March 2019.
-
Analysis on Gradient Propagation in Batch Normalized Residual Networks
Authors:
Abhishek Panigrahi,
Yueru Chen,
C. -C. Jay Kuo
Abstract:
We conduct mathematical analysis on the effect of batch normalization (BN) on gradient backpropogation in residual network training, which is believed to play a critical role in addressing the gradient vanishing/explosion problem, in this work. By analyzing the mean and variance behavior of the input and the gradient in the forward and backward passes through the BN and residual branches, respecti…
▽ More
We conduct mathematical analysis on the effect of batch normalization (BN) on gradient backpropogation in residual network training, which is believed to play a critical role in addressing the gradient vanishing/explosion problem, in this work. By analyzing the mean and variance behavior of the input and the gradient in the forward and backward passes through the BN and residual branches, respectively, we show that they work together to confine the gradient variance to a certain range across residual blocks in backpropagation. As a result, the gradient vanishing/explosion problem is avoided. We also show the relative importance of batch normalization w.r.t. the residual branches in residual networks.
△ Less
Submitted 2 December, 2018;
originally announced December 2018.
-
PennyLane: Automatic differentiation of hybrid quantum-classical computations
Authors:
Ville Bergholm,
Josh Izaac,
Maria Schuld,
Christian Gogolin,
Shahnawaz Ahmed,
Vishnu Ajith,
M. Sohaib Alam,
Guillermo Alonso-Linaje,
B. AkashNarayanan,
Ali Asadi,
Juan Miguel Arrazola,
Utkarsh Azad,
Sam Banning,
Carsten Blank,
Thomas R Bromley,
Benjamin A. Cordier,
Jack Ceroni,
Alain Delgado,
Olivia Di Matteo,
Amintor Dusko,
Tanya Garg,
Diego Guala,
Anthony Hayes,
Ryan Hill,
Aroosa Ijaz
, et al. (43 additional authors not shown)
Abstract:
PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpro…
▽ More
PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpropagation. PennyLane thus extends the automatic differentiation algorithms common in optimization and machine learning to include quantum and hybrid computations. A plugin system makes the framework compatible with any gate-based quantum simulator or hardware. We provide plugins for hardware providers including the Xanadu Cloud, Amazon Braket, and IBM Quantum, allowing PennyLane optimizations to be run on publicly accessible quantum devices. On the classical front, PennyLane interfaces with accelerated machine learning libraries such as TensorFlow, PyTorch, JAX, and Autograd. PennyLane can be used for the optimization of variational quantum eigensolvers, quantum approximate optimization, quantum machine learning models, and many other applications.
△ Less
Submitted 29 July, 2022; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Analyzing Social Book Reading Behavior on Goodreads and how it predicts Amazon Best Sellers
Authors:
Suman Kalyan Maity,
Abhishek Panigrahi,
Animesh Mukherjee
Abstract:
A book's success/popularity depends on various parameters - extrinsic and intrinsic. In this paper, we study how the book reading characteristics might influence the popularity of a book. Towards this objective, we perform a cross-platform study of Goodreads entities and attempt to establish the connection between various Goodreads entities and the popular books ("Amazon best sellers"). We analyze…
▽ More
A book's success/popularity depends on various parameters - extrinsic and intrinsic. In this paper, we study how the book reading characteristics might influence the popularity of a book. Towards this objective, we perform a cross-platform study of Goodreads entities and attempt to establish the connection between various Goodreads entities and the popular books ("Amazon best sellers"). We analyze the collective reading behavior on Goodreads platform and quantify various characteristic features of the Goodreads entities to identify differences between these Amazon best sellers (ABS) and the other non-best selling books. We then develop a prediction model using the characteristic features to predict if a book shall become a best seller after one month (15 days) since its publication. On a balanced set, we are able to achieve a very high average accuracy of 88.72% (85.66%) for the prediction where the other competitive class contains books which are randomly selected from the Goodreads dataset. Our method primarily based on features derived from user posts and genre related characteristic properties achieves an improvement of 16.4% over the traditional popularity factors (ratings, reviews) based baseline methods. We also evaluate our model with two more competitive set of books a) that are both highly rated and have received a large number of reviews (but are not best sellers) (HRHR) and b) Goodreads Choice Awards Nominated books which are non-best sellers (GCAN). We are able to achieve quite good results with very high average accuracy of 87.1% and as well a high ROC for ABS vs GCAN. For ABS vs HRHR, our model yields a high average accuracy of 86.22%.
△ Less
Submitted 19 September, 2018;
originally announced September 2018.
-
HIV, Cardiovascular Diseases, and Chronic Arsenic Exposure co-exist in a Positive Synergy
Authors:
Arghya Panigrahi,
Amit K Chattopadhyay,
Goutam Paul,
Soumya Panigrahi
Abstract:
Recent epidemiological evidences indicate that arsenic exposure increases risk of atherosclerosis, cardiovascular diseases and microangiopathies in addition to the serious global health concern related to its carcinogenic effects. In experiments on animals, acute and chronic exposure to arsenic directly correlates cardiac tachyarrhythmia, and atherogenesis in a concentration and duration dependent…
▽ More
Recent epidemiological evidences indicate that arsenic exposure increases risk of atherosclerosis, cardiovascular diseases and microangiopathies in addition to the serious global health concern related to its carcinogenic effects. In experiments on animals, acute and chronic exposure to arsenic directly correlates cardiac tachyarrhythmia, and atherogenesis in a concentration and duration dependent manner. Moreover, the other effects of long-term arsenic exposure include induction of non-insulin dependent diabetes by mechanisms yet to be understood. On the other hand, there are controversial issues, gaps in knowledge, and future research priorities of accelerated incidences of CVD and mortalities in patients with HIV who are under long-term anti-retroviral therapy (ART). Although, both HIV infection itself and various components of ART initiate significant pathological alterations in the myocardium and the vasculature, simultaneous environmental exposure to arsenic which is more convincingly being recognized as a facilitator of HIV viral cycling in the infected immune cells, may contribute an additional layer of adversity in these patients. In this mini-review which have been fortified with our own preliminary data, we will discuss some of the key current understating of chronic arsenic exposure, and its possible impact on the accelerated HIV/ART induced CVD. The review will conclude with notes on recent developments in mathematical modeling in this field that probabilistically forecast incidence prevalence as functions of aging and life style parameters, most of which vary with time themselves; this interdisciplinary approach provides a complementary kernel to conventional biology.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.
-
Determining the network throughput and flow rate using GSR And AAL2R
Authors:
Adyasha Behera,
Amrutanshu Panigrahi
Abstract:
In multi-radio wireless mesh networks, one node is eligible to transmit packets over multiple channels to different destination nodes simultaneously. This feature of multi-radio wireless mesh network makes high throughput for the network and increase the chance for multi path routing. This is because the multiple channel availability for transmission decreases the probability of the most elegant p…
▽ More
In multi-radio wireless mesh networks, one node is eligible to transmit packets over multiple channels to different destination nodes simultaneously. This feature of multi-radio wireless mesh network makes high throughput for the network and increase the chance for multi path routing. This is because the multiple channel availability for transmission decreases the probability of the most elegant problem called as interference problem which is either of interflow and intraflow type. For avoiding the problem like interference and maintaining the constant network performance or increasing the performance the WMN need to consider the packet aggregation and packet forwarding. Packet aggregation is process of collecting several packets ready for transmission and sending them to the intended recipient through the channel, while the packet forwarding holds the hop-by-hop routing. But choosing the correct path among different available multiple paths is most the important factor in the both case for a routing algorithm. Hence the most challenging factor is to determine a forwarding strategy which will provide the schedule for each node for transmission within the channel. In this research work we have tried to implement two forwarding strategies for the multi path multi radio WMN as the approximate solution for the above said problem. We have implemented Global State Routing (GSR) which will consider the packet forwarding concept and Aggregation Aware Layer 2 Routing (AAL2R) which considers the both concept i.e. both packet forwarding and packet aggregation. After the successful implementation the network performance has been measured by means of simulation study.
△ Less
Submitted 7 August, 2015;
originally announced August 2015.
-
2-Variable Boolean Operation -- its use in Pattern Formation
Authors:
Sudhakar Sahoo,
Ipsita Mohanty,
Garisha Chowdhary,
Arpit Panigrahi
Abstract:
In this paper the theory of 2-Variable Boolean Operation (2-VBO) has been discussed on a pair of n-bit strings. 2-VBO serves to bring out the relation between numbers which when plot on a 2-D surface form interesting patterns; patterns that may be fixed, periodic, chaotic or complex. Some of these patterns represent natural fractals. This paper also provides mathematical analysis corresponding to…
▽ More
In this paper the theory of 2-Variable Boolean Operation (2-VBO) has been discussed on a pair of n-bit strings. 2-VBO serves to bring out the relation between numbers which when plot on a 2-D surface form interesting patterns; patterns that may be fixed, periodic, chaotic or complex. Some of these patterns represent natural fractals. This paper also provides mathematical analysis corresponding to each of the obtained patterns, which would aid to understanding their formation. 2-VBO is an attempt towards the production and classification of patterns which represent various mathematical models and naturally occurring phenomena.
△ Less
Submitted 15 August, 2010;
originally announced August 2010.
-
On the structure of $p$-zero-sum free sequences and its application to a variant of Erdos--Ginzburg--Ziv theorem
Authors:
W D Gao,
A Panigrahi,
R Thangadurai
Abstract:
Let $p$ be any odd prime number. Let $k$ be any positive integer such that $2\leq k\leq [\frac{p+1}3]+1$. Let $S = (a_1,a_2,...,a_{2p-k})$ be any sequence in ${\Bbb Z}_p$ such that there is no subsequence of length $p$ of $S$ whose sum is zero in $\zp$. Then we prove that we can arrange the sequence $S$ as follows:…
▽ More
Let $p$ be any odd prime number. Let $k$ be any positive integer such that $2\leq k\leq [\frac{p+1}3]+1$. Let $S = (a_1,a_2,...,a_{2p-k})$ be any sequence in ${\Bbb Z}_p$ such that there is no subsequence of length $p$ of $S$ whose sum is zero in $\zp$. Then we prove that we can arrange the sequence $S$ as follows: $ S = (\underbrace{a, a, ..., a}_{u {\rm times}}, \underbrace{b, b, >..., b}_{v {\rm times}}, a_1', a_2', >..., a_{2p-k-u-v}') $ where $u\geq v$, $u+v\geq 2p-2k+2$ and $a-b$ generates $\zp$. This extends a result in \cite{gao10} to all primes $p$ and $k$ satisfying $(p+1)/4+3\leq k\leq (p+1)/3+1$. Also, we prove that if $g$ denotes the number of distinct residue classes modulo $p$ appearing in the sequence $S$ in $\zp$ of length $2p-k$ $(2\leq k\leq [(p+1)/4]+1)$, and $g\geq 2\sqrt{2}\sqrt{k-2}$, then there exists a subsequence of $S$ of length $p$ whose sum is zero in $\zp$.
△ Less
Submitted 5 March, 2005;
originally announced March 2005.