-
A ripple in time: a discontinuity in American history
Authors:
Alexander Kolpakov,
Igor Rivin
Abstract:
In this note we use the State of the Union Address (SOTU) dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2.
While it is widely believed that BERT (and its variations…
▽ More
In this note we use the State of the Union Address (SOTU) dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2.
While it is widely believed that BERT (and its variations) is most suitable for NLP classification tasks, we find out that GPT-2 in conjunction with nonlinear dimension reduction methods such as UMAP provide better separation and stronger clustering. This makes GPT-2 + UMAP an interesting alternative. In our case, no model fine-tuning is required, and the pre-trained out-of-the-box GPT-2 model is enough.
We also used a fine-tuned DistilBERT model for classification detecting which President delivered which address, with very good results (accuracy 93% - 95% depending on the run). An analogous task was performed to determine the year of writing, and we were able to pin it down to about 4 years (which is a single presidential term).
It is worth noting that SOTU addresses provide relatively small writing samples (with about 8'000 words on average, and varying widely from under 2'000 words to more than 20'000), and that the number of authors is relatively large (we used SOTU addresses of 42 US presidents). This shows that the techniques employed turn out to be rather efficient, while all the computations described in this note can be performed using a single GPU instance of Google Colab.
The accompanying code is available on GitHub.
△ Less
Submitted 4 May, 2024; v1 submitted 2 December, 2023;
originally announced December 2023.
-
Four Random Permutations Conjugated by an Adversary Generate $S_n$ with High Probability
Authors:
Robin Pemantle,
Yuval Peres,
Igor Rivin
Abstract:
We prove a conjecture dating back to a 1978 paper of D.R.\ Musser~\cite{musserirred}, namely that four random permutations in the symmetric group $\mathcal{S}_n$ generate a transitive subgroup with probability $p_n > ε$ for some $ε> 0$ independent of $n$, even when an adversary is allowed to conjugate each of the four by a possibly different element of $§_n$ (in other words, the cycle types alread…
▽ More
We prove a conjecture dating back to a 1978 paper of D.R.\ Musser~\cite{musserirred}, namely that four random permutations in the symmetric group $\mathcal{S}_n$ generate a transitive subgroup with probability $p_n > ε$ for some $ε> 0$ independent of $n$, even when an adversary is allowed to conjugate each of the four by a possibly different element of $§_n$ (in other words, the cycle types already guarantee generation of $\mathcal{S}_n$). This is closely related to the following random set model. A random set $M \subseteq \mathbb{Z}^+$ is generated by including each $n \geq 1$ independently with probability $1/n$. The sumset $\text{sumset}(M)$ is formed. Then at most four independent copies of $\text{sumset}(M)$ are needed before their mutual intersection is no longer infinite.
△ Less
Submitted 11 December, 2014;
originally announced December 2014.
-
Spectral Experiments+
Authors:
Igor Rivin
Abstract:
We describe extensive computational experiments on spectral properties of random objects - random cubic graphs, random planar triangulations, and Voronoi and Delaunay diagrams of random (uniformly distributed) point sets on the sphere). We look at bulk eigenvalue distribution, eigenvalue spacings, and locality properties of eigenvectors. We also look at the statistics of \emph{nodal domains} of ei…
▽ More
We describe extensive computational experiments on spectral properties of random objects - random cubic graphs, random planar triangulations, and Voronoi and Delaunay diagrams of random (uniformly distributed) point sets on the sphere). We look at bulk eigenvalue distribution, eigenvalue spacings, and locality properties of eigenvectors. We also look at the statistics of \emph{nodal domains} of eigenvectors on these graphs. In all cases we discover completely new (at least to this author) phenomena. The author has tried to refrain from making specific conjectures, inviting the reader, instead, to meditate on the data.
△ Less
Submitted 26 October, 2014; v1 submitted 12 October, 2014;
originally announced October 2014.
-
Large Galois groups with applications to Zariski density
Authors:
Igor Rivin
Abstract:
We introduce the first provably efficient algorithm to check if a finitely generated subgroup of an almost simple semi-simple group over the rationals is Zariski-dense. We reduce this question to one of computing Galois groups, and to this end we describe efficient algorithms to check if the Galois group of a polynomial $p$ with integer coefficients is "generic" (which, for arbitrary polynomials o…
▽ More
We introduce the first provably efficient algorithm to check if a finitely generated subgroup of an almost simple semi-simple group over the rationals is Zariski-dense. We reduce this question to one of computing Galois groups, and to this end we describe efficient algorithms to check if the Galois group of a polynomial $p$ with integer coefficients is "generic" (which, for arbitrary polynomials of degree $n$ means the full symmetric group $S_n,$ while for reciprocal polynomials of degree $2n$ it means the hyperoctahedral group $C_2 \wr S_n.$). We give efficient algorithms to verify that a polynomial has Galois group $S_n,$ and that a reciprocal polynomial has Galois group $C_2 \wr S_n.$ We show how these algorithms give efficient algorithms to check if a set of matrices $\mathcal{G}$ in $\mathop{SL}(n, \mathbb{Z})$ or $\mathop{Sp}(2n, \mathbb{Z})$ generate a \emph{Zariski dense} subgroup.
The complexity of doing this in$\mathop{SL}(n, \mathbb{Z})$ is of order $O(n^4 \log n \log \|\mathcal{G}\|)\log ε$ and in $\mathop{Sp}(2n, \mathbb{Z})$ the complexity is of order $O(n^8 \log n\log \|\mathcal{G}\|)\log ε$ In general semisimple groups we show that Zariski density can be confirmed or denied in time of order $O(n^14 \log \|\mathcal{G}\|\log ε),$ where $ε$ is the probability of a wrong "NO" answer, while $\|\mathcal{G}\|$ is the measure of complexity of the input (the maximum of the Frobenius norms of the generating matrices). The algorithms work essentially without change over algebraic number fields, and in other semi-simple groups. However, we restrict to the case of the special linear and symplectic groups and rational coefficients in the interest of clarity.
△ Less
Submitted 6 January, 2015; v1 submitted 10 December, 2013;
originally announced December 2013.
-
The performance of the batch learner algorithm
Authors:
Igor Rivin
Abstract:
We analyze completely the convergence speed of the \emph{batch learning algorithm}, and compare its speed to that of the memoryless learning algorithm and of learning with memory. We show that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance \emph{vis-a-vis} learning with full memory is less clearcut, and depends on cer…
▽ More
We analyze completely the convergence speed of the \emph{batch learning algorithm}, and compare its speed to that of the memoryless learning algorithm and of learning with memory. We show that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance \emph{vis-a-vis} learning with full memory is less clearcut, and depends on certain probabilistic assumptions.
△ Less
Submitted 14 January, 2002;
originally announced January 2002.
-
Yet another zeta function and learning
Authors:
Igor Rivin
Abstract:
We study the convergence speed of the batch learning algorithm, and compare its speed to that of the memoryless learning algorithm and of learning with memory (as analyzed in joint work with N. Komarova). We obtain precise results and show in particular that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance vis-a-vis lea…
▽ More
We study the convergence speed of the batch learning algorithm, and compare its speed to that of the memoryless learning algorithm and of learning with memory (as analyzed in joint work with N. Komarova). We obtain precise results and show in particular that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance vis-a-vis learning with full memory is less clearcut, and depends on certainprobabilistic assumptions. These results necessitate theintroduction of the moment zeta function of a probability distribution and the study of some of its properties.
△ Less
Submitted 25 July, 2001;
originally announced July 2001.
-
Harmonic mean, random polynomials and stochastic matrices
Authors:
Natalia Komarova,
Igor Rivin
Abstract:
Motivated by a problem in learning theory, we are led to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of th…
▽ More
Motivated by a problem in learning theory, we are led to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, and that, in turn, leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws.
△ Less
Submitted 2 December, 2001; v1 submitted 28 May, 2001;
originally announced May 2001.
-
Mathematics of learning
Authors:
Natalia Komarova,
Igor Rivin
Abstract:
We study the convergence properties of a pair of learning algorithms (learning with and without memory). This leads us to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). Th…
▽ More
We study the convergence properties of a pair of learning algorithms (learning with and without memory). This leads us to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, which leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws. The reader can find the proofs of most of the results announced here in the paper entitled "Harmonic mean, random polynomials, and random matrices", by the same authors.
△ Less
Submitted 2 December, 2001; v1 submitted 28 May, 2001;
originally announced May 2001.