-
Universal Anomaly Detection at the LHC: Transforming Optimal Classifiers and the DDD Method
Authors:
Sascha Caron,
José Enrique García Navarro,
María Moreno Llácer,
Polina Moskvitina,
Mats Rovers,
Adrián Rubio Jímenez,
Roberto Ruiz de Austri,
Zhongyi Zhang
Abstract:
In this work, we present a novel approach to transform supervised classifiers into effective unsupervised anomaly detectors. The method we have developed, termed Discriminatory Detection of Distortions (DDD), enhances anomaly detection by training a discriminator model on both original and artificially modified datasets. We conducted a comprehensive evaluation of our models on the Dark Machines An…
▽ More
In this work, we present a novel approach to transform supervised classifiers into effective unsupervised anomaly detectors. The method we have developed, termed Discriminatory Detection of Distortions (DDD), enhances anomaly detection by training a discriminator model on both original and artificially modified datasets. We conducted a comprehensive evaluation of our models on the Dark Machines Anomaly Score Challenge channels and a search for 4-top quark events, demonstrating the effectiveness of our approach across various final states and beyond the Standard Model scenarios.
We compare the performance of the DDD method with the Deep Robust One-Class Classification method (DROCC), which incorporates signals in the training process, and the Deep Support Vector Data Description (DeepSVDD) method, a well established and well performing method for anomaly detection. Results show that the effectiveness of each model varies by signal and channel, with DDD proving to be a very effective anomaly detector. We recommend the combined use of DeepSVDD and DDD for purely unsupervised applications, with the addition of flow models for improved performance when resources allow.
Findings suggest that network architectures that excel in supervised contexts, such as the particle transformer with standard model interactions, also perform well as unsupervised anomaly detectors. We also show that with these methods, it is likely possible to recognize 4-top quark production as an anomaly without prior knowledge of the process. We argue that the Large Hadron Collider community can transform supervised classifiers into anomaly detectors to uncover potential new physical phenomena in each search.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
The VISTA Variables in the Vía Láctea eXtended (VVVX) ESO public survey: Completion of the observations and legacy
Authors:
R. K. Saito,
M. Hempel,
J. Alonso-García,
P. W. Lucas,
D. Minniti,
S. Alonso,
L. Baravalle,
J. Borissova,
C. Caceres,
A. N. Chené,
N. J. G. Cross,
F. Duplancic,
E. R. Garro,
M. Gómez,
V. D. Ivanov,
R. Kurtev,
A. Luna,
D. Majaess,
M. G. Navarro,
J. B. Pullen,
M. Rejkuba,
J. L. Sanders,
L. C. Smith,
P. H. C. Albino,
M. V. Alonso
, et al. (121 additional authors not shown)
Abstract:
The ESO public survey VISTA Variables in the Vía Láctea (VVV) surveyed the inner Galactic bulge and the adjacent southern Galactic disk from $2009-2015$. Upon its conclusion, the complementary VVV eXtended (VVVX) survey has expanded both the temporal as well as spatial coverage of the original VVV area, widening it from $562$ to $1700$ sq. deg., as well as providing additional epochs in…
▽ More
The ESO public survey VISTA Variables in the Vía Láctea (VVV) surveyed the inner Galactic bulge and the adjacent southern Galactic disk from $2009-2015$. Upon its conclusion, the complementary VVV eXtended (VVVX) survey has expanded both the temporal as well as spatial coverage of the original VVV area, widening it from $562$ to $1700$ sq. deg., as well as providing additional epochs in $JHK_{\rm s}$ filters from $2016-2023$. With the completion of VVVX observations during the first semester of 2023, we present here the observing strategy, a description of data quality and access, and the legacy of VVVX. VVVX took $\sim 2000$ hours, covering about 4% of the sky in the bulge and southern disk. VVVX covered most of the gaps left between the VVV and the VISTA Hemisphere Survey (VHS) areas and extended the VVV time baseline in the obscured regions affected by high extinction and hence hidden from optical observations. VVVX provides a deep $JHK_{\rm s}$ catalogue of $\gtrsim 1.5\times10^9$ point sources, as well as a $K_{\rm s}$ band catalogue of $\sim 10^7$ variable sources. Within the existing VVV area, we produced a $5D$ map of the surveyed region by combining positions, distances, and proper motions of well-understood distance indicators such as red clump stars, RR Lyrae, and Cepheid variables. In March 2023 we successfully finished the VVVX survey observations that started in 2016, an accomplishment for ESO Paranal Observatory upon 4200 hours of observations for VVV+VVVX. The VVV+VVVX catalogues complement those from the Gaia mission at low Galactic latitudes and provide spectroscopic targets for the forthcoming ESO high-multiplex spectrographs MOONS and 4MOST.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Characters, Hall subgroups, and Normal Complements
Authors:
Robert Guralnick,
Gabriel Navarro
Abstract:
We settle a question from 1962: if $H$ is a Hall subgroup of a finite group $G$, then all the irreducible complex characters of $H$ extend to $G$ if and only if $H$ haS A normal complement.
We settle a question from 1962: if $H$ is a Hall subgroup of a finite group $G$, then all the irreducible complex characters of $H$ extend to $G$ if and only if $H$ haS A normal complement.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Reinforcement-Learning based routing for packet-optical networks with hybrid telemetry
Authors:
A. L. García Navarro,
Nataliia Koneva,
Alfonso Sánchez-Macián,
José Alberto Hernández,
Óscar González de Dios,
J. M. Rivas-Moscoso
Abstract:
This article provides a methodology and open-source implementation of Reinforcement Learning algorithms for finding optimal routes in a packet-optical network scenario. The algorithm uses measurements provided by the physical layer (pre-FEC bit error rate and propagation delay) and the link layer (link load) to configure a set of latency-based rewards and penalties based on such measurements. Then…
▽ More
This article provides a methodology and open-source implementation of Reinforcement Learning algorithms for finding optimal routes in a packet-optical network scenario. The algorithm uses measurements provided by the physical layer (pre-FEC bit error rate and propagation delay) and the link layer (link load) to configure a set of latency-based rewards and penalties based on such measurements. Then, the algorithm executes Q-learning based on this set of rewards for finding the optimal routing strategies. It is further shown that the algorithm dynamically adapts to changing network conditions by re-calculating optimal policies upon either link load changes or link degradation as measured by pre-FEC BER.
△ Less
Submitted 21 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Counting on General Run-Length Grammars
Authors:
Gonzalo Navarro,
Alejandro Pacheco
Abstract:
We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in a text of length $n$ in time (O(m\log^{2+ε} n)), for any constant (ε> 0). This closes an open problem posed by Christiansen et al.~[ACM TALG 2020] and enhances…
▽ More
We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in a text of length $n$ in time (O(m\log^{2+ε} n)), for any constant (ε> 0). This closes an open problem posed by Christiansen et al.~[ACM TALG 2020] and enhances our abilities for computation over compressed data; we give an example application.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Adaptive Dynamic Bitvectors
Authors:
Gonzalo Navarro
Abstract:
While operations \emph{rank} and \emph{select} on static bitvectors can be supported in constant time, lower bounds show that supporting updates raises the cost per operation to $Θ(\log n/ \log\log n)$. This is a shame in scenarios where updates are possible but uncommon. We develop a representation of bitvectors that, if there are $q = Ω(\log^2 n)$ queries per update, supports all the operations…
▽ More
While operations \emph{rank} and \emph{select} on static bitvectors can be supported in constant time, lower bounds show that supporting updates raises the cost per operation to $Θ(\log n/ \log\log n)$. This is a shame in scenarios where updates are possible but uncommon. We develop a representation of bitvectors that, if there are $q = Ω(\log^2 n)$ queries per update, supports all the operations in $O(\log(n/q))$ amortized time. Our experimental results support the theoretical findings, displaying speedups of orders of magnitude compared to standard dynamic implementations.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Generalized Straight-Line Programs
Authors:
Gonzalo Navarro,
Francisco Olivares,
Cristian Urbina
Abstract:
It was recently proved that any Straight-Line Program (SLP) generating a given string can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We generalize this proof to a general class of grammars we call Generalized SLPs (GSLPs), which allow rules of the form $A \rightarrow x$ where $x$ is any Turing-complete representation (of size $|x|$) of a sequence of…
▽ More
It was recently proved that any Straight-Line Program (SLP) generating a given string can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We generalize this proof to a general class of grammars we call Generalized SLPs (GSLPs), which allow rules of the form $A \rightarrow x$ where $x$ is any Turing-complete representation (of size $|x|$) of a sequence of symbols (potentially much longer than $|x|$). We then specialize GSLPs to so-called Iterated SLPs (ISLPs), which allow rules of the form $A \rightarrow Π_{i=k_1}^{k_2} B_1^{i^{c_1}}\cdots B_t^{i^{c_t}}$ of size $2t+2$. We prove that ISLPs break, for some text families, the measure $δ$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness. Further, ISLPs can extract any substring of length $λ$, from the represented text $T[1.. n]$, in time $O(λ+ \log^2 n\log\log n)$. This is the first compressed representation for repetitive texts breaking $δ$ while, at the same time, supporting direct access to arbitrary text symbols in polylogarithmic time. We also show how to compute some substring queries, like range minima and next/previous smaller value, in time $O(\log^2 n \log\log n)$. Finally, we further specialize the grammars to Run-Length SLPs (RLSLPs), which restrict the rules allowed by ISLPs to the form $A \rightarrow B^t$. Apart from inheriting all the previous results with the term $\log^2 n \log\log n$ reduced to the near-optimal $\log n$, we show that RLSLPs can exploit balance to efficiently compute a wide class of substring queries we call ``composable'' -- i.e., $f(X \cdot Y)$ can be obtained from $f(X)$ and $f(Y)$...
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
PROJECT-J: JWST observations of HH46~IRS and its outflow. Overview and first results
Authors:
B. Nisini,
M. G. Navarro,
T. Giannini,
S. Antoniucci,
P. J. Kavanagh,
P. Hartigan,
F. Bacciotti,
A. Caratti o Garatti,
A. Noriega Crespo,
E. van Dishoek,
E. Whelan,
H. G. Arce,
S. Cabrit,
D. Coffey,
D. Fedele,
J. Eisloeffel,
M. E. Palumbo,
L. Podio,
T. P. Ray,
M. Schultze,
R. G. Urso,
J. M. Alcala',
M. A. Bautista,
C. Codella,
T. G. Greene
, et al. (1 additional authors not shown)
Abstract:
We present the first results of the JWST program PROJECT-J (PROtostellar JEts Cradle Tested with JWST ), designed to study the Class I source HH46 IRS and its outflow through NIRSpec and MIRI spectroscopy (1.66 to 28 micron). The data provide line-images (~ 6.6" in length with NIRSpec, and up to 20" with MIRI) revealing unprecedented details within the jet, the molecular outflow and the cavity. We…
▽ More
We present the first results of the JWST program PROJECT-J (PROtostellar JEts Cradle Tested with JWST ), designed to study the Class I source HH46 IRS and its outflow through NIRSpec and MIRI spectroscopy (1.66 to 28 micron). The data provide line-images (~ 6.6" in length with NIRSpec, and up to 20" with MIRI) revealing unprecedented details within the jet, the molecular outflow and the cavity. We detect, for the first time, the red-shifted jet within ~ 90 au from the source. Dozens of shock-excited forbidden lines are observed, including highly ionized species such as [Ne III] 15.5 micron, suggesting that the gas is excited by high velocity (> 80 km/s) shocks in a relatively high density medium. Images of H2 lines at different excitations outline a complex molecular flow, where a bright cavity, molecular shells, and a jet-driven bow-shock interact with and are shaped by the ambient conditions. Additional NIRCam 2 micron images resolve the HH46 IRS ~ 110 au binary system and suggest that the large asymmetries observed between the jet and the H2 wide angle emission could be due to two separate outflows being driven by the two sources. The spectra of the unresolved binary show deep ice bands and plenty of gaseous lines in absorption, likely originating in a cold envelope or disk. In conclusion, JWST has unraveled for the first time the origin of the HH46 IRS complex outflow demonstrating its capability to investigate embedded regions around young stars, which remain elusive even at near-IR wavelengths.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A Textbook Solution for Dynamic Strings
Authors:
Zsuzsanna Lipták,
Francesco Masillo,
Gonzalo Navarro
Abstract:
We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem can be solved in optimal time $\mathcal{O}(\log N)$ whp for the updates and $\mathcal{O}(1)$ worst-case time for the queries, where $N$ is the total collection s…
▽ More
We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem can be solved in optimal time $\mathcal{O}(\log N)$ whp for the updates and $\mathcal{O}(1)$ worst-case time for the queries, where $N$ is the total collection size [Gawrychowski et al., SODA 2018]. We present here a much simpler solution based on a forest of enhanced splay trees (FeST), where both the updates and the substring comparison take $\mathcal{O}(\log n)$ amortized time, $n$ being the lengths of the strings involved. The longest common prefix of length $\ell$ is computed in $\mathcal{O}(\log n + \log^2\ell)$ amortized time. Our query results are correct whp. Our simpler solution enables other more general updates in $\mathcal{O}(\log n)$ amortized time, such as reversing a substring and/or map** its symbols. We can also regard substrings as circular or as their omega extension.
△ Less
Submitted 6 July, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
BAT-LZ Out of Hell
Authors:
Zsuzsanna Lipták,
Francesco Masillo,
Gonzalo Navarro
Abstract:
Despite consistently yielding the best compression on repetitive text collections, the Lempel-Ziv parsing has resisted all attempts at offering relevant guarantees on the cost to access an arbitrary symbol. This makes it less attractive for use on compressed self-indexes and other compressed data structures. In this paper we introduce a variant we call BAT-LZ (for Bounded Access Time Lempel-Ziv) w…
▽ More
Despite consistently yielding the best compression on repetitive text collections, the Lempel-Ziv parsing has resisted all attempts at offering relevant guarantees on the cost to access an arbitrary symbol. This makes it less attractive for use on compressed self-indexes and other compressed data structures. In this paper we introduce a variant we call BAT-LZ (for Bounded Access Time Lempel-Ziv) where the access cost is bounded by a parameter given at compression time. We design and implement a linear-space algorithm that, in time $O(n\log^3 n)$, obtains a BAT-LZ parse of a text of length $n$ by greedily maximizing each next phrase length. The algorithm builds on a new linear-space data structure that solves 5-sided orthogonal range queries in rank space, allowing updates to the coordinate where the one-sided queries are supported, in $O(\log^3 n)$ time for both queries and updates. This time can be reduced to $O(\log^2 n)$ if $O(n\log n)$ space is used.
We design a second algorithm that chooses the sources for the phrases in a clever way, using an enhanced suffix tree, albeit no longer guaranteeing longest possible phrases. This algorithm is much slower in theory, but in practice it is comparable to the greedy parser, while achieving significantly superior compression. We then combine the two algorithms, resulting in a parser that always chooses the longest possible phrases, and the best sources for those. Our experimentation shows that, on most repetitive texts, our algorithms reach an access cost close to $\log_2 n$ on texts of length $n$, while incurring almost no loss in the compression ratio when compared with classical LZ-compression. Several open challenges are discussed at the end of the paper.
△ Less
Submitted 23 April, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Iterated Straight-Line Programs
Authors:
Gonzalo Navarro,
Cristian Urbina
Abstract:
We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure $δ$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are crucial in areas like Bioinformatics). The extension, called iterated SLPs (ISLPs), allows rules of the form…
▽ More
We explore an extension to straight-line programs (SLPs) that outperforms, for some text families, the measure $δ$ based on substring complexity, a lower bound for most measures and compressors exploiting repetitiveness (which are crucial in areas like Bioinformatics). The extension, called iterated SLPs (ISLPs), allows rules of the form $A \rightarrow Π_{i=k_1}^{k_2} B_1^{i^{c_1}}\cdots B_t^{i^{c_t}}$, for which we show how to extract any substring of length $λ$, from the represented text $T[1.. n]$, in time $O(λ+ \log^2 n\log\log n)$. This is the first compressed representation for repetitive texts breaking $δ$ while, at the same time, supporting direct access to arbitrary text symbols in polylogarithmic time. As a byproduct, we extend Ganardi et al.'s technique to balance any SLP (so it has a derivation tree of logarithmic height) to a wide generalization of SLPs, including ISLPs.
△ Less
Submitted 15 February, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests
Authors:
Dominika Draesslerová,
Omar Ahmed,
Travis Gagie,
Jan Holub,
Ben Langmead,
Giovanni Manzini,
Gonzalo Navarro
Abstract:
For taxonomic classification, we are asked to index the genomes in a phylogenetic tree such that later, given a DNA read, we can quickly choose a small subtree likely to contain the genome from which that read was drawn. Although popular classifiers such as Kraken use $k$-mers, recent research indicates that using maximal exact matches (MEMs) can lead to better classifications. For example, we can…
▽ More
For taxonomic classification, we are asked to index the genomes in a phylogenetic tree such that later, given a DNA read, we can quickly choose a small subtree likely to contain the genome from which that read was drawn. Although popular classifiers such as Kraken use $k$-mers, recent research indicates that using maximal exact matches (MEMs) can lead to better classifications. For example, we can build an augmented FM-index over the the genomes in the tree concatenated in left-to-right order; for each MEM in a read, find the interval in the suffix array containing the starting positions of that MEM's occurrences in those genomes; find the minimum and maximum values stored in that interval; take the lowest common ancestor (LCA) of the genomes containing the characters at those positions. This solution is practical, however, only when the total size of the genomes in the tree is fairly small. In this paper we consider applying the same solution to three lossily compressed representations of the genomes' concatenation: a KATKA kernel, which discards characters that are not in the first or last occurrence of any $k_{\max}$-tuple, for a parameter $k_{\max}$; a minimizer digest; a KATKA kernel of a minimizer digest. With a test dataset and these three representations of it, simulated reads and various parameter settings, we checked how many reads' longest MEMs occurred only in the sequences from which those reads were generated ("true positive" reads). For some parameter settings we achieved significant compression while only slightly decreasing the true-positive rate.
△ Less
Submitted 4 April, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
The most variable VVV sources: eruptive protostars, dip** giants in the Nuclear Disc and others
Authors:
P. W. Lucas,
L. C. Smith,
Z. Guo,
C. Contreras Peña,
D. Minniti,
N. Miller,
J. Alonso-García,
M. Catelan,
J. Borissova,
R. K. Saito,
R. Kurtev,
M. G. Navarro,
C. Morris,
H. Muthu,
D. Froebrich,
V. D. Ivanov,
A. Bayo,
A. Caratti o Garatti,
J. L. Sanders
Abstract:
We have performed a comprehensive search of a VISTA Variables in the Via Lactea (VVV) database of 9.5 yr light curves for variable sources with $ΔK_s \ge 4$ mag, aiming to provide a large sample of high amplitude eruptive young stellar objects (YSOs) and detect unusual or new types of infrared variable source. We find 222 variable or transient sources in the Galactic bulge and disc, most of which…
▽ More
We have performed a comprehensive search of a VISTA Variables in the Via Lactea (VVV) database of 9.5 yr light curves for variable sources with $ΔK_s \ge 4$ mag, aiming to provide a large sample of high amplitude eruptive young stellar objects (YSOs) and detect unusual or new types of infrared variable source. We find 222 variable or transient sources in the Galactic bulge and disc, most of which are new discoveries. The sample mainly comprises novae, YSOs, microlensing events, Long Period Variable stars (LPVs) and a few rare or unclassified sources. Additionally, we report the discovery of a significant population of aperiodic late-type giant stars suffering deep extinction events, strongly clustered in the Nuclear Disc of the Milky Way. We suggest that these are metal-rich stars in which radiatively driven mass loss has been enhanced by super-solar metallicity. Among the YSOs, 32/40 appear to be undergoing episodic accretion. Long-lasting YSO eruptions have a typical rise time of $\sim$2 yr, somewhat slower than the 6-12 month timescale seen in the few historical events observed on the rise. The outburst durations are usually at least 5 yr, somewhat longer than many lower amplitude VVV events detected previously. The light curves are diverse in nature, suggesting that multiple types of disc instability may occur. Eight long-duration extinction events are seen wherein the YSO dims for a year or more, attributable to inner disc structure. One binary YSO in NGC 6530 displays periodic extinction events (P=59 days) similar to KH 15D.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
The globular cluster VVV CL002 falling down to the hazardous Galactic centre
Authors:
D. Minniti,
N. Matsunaga,
J. G. Fernandez-Trincado,
S. Otsubo,
Y. Sarugaku,
T. Takeuchi,
H. Katoh,
S. Hamano,
Y. Ikeda,
H. Kawakita,
P. W. Lucas,
L. C. Smith,
I. Petralia,
E. R. Garro,
R. K. Saito,
J. Alonso-Garcia,
M. Gomez,
M. G. Navarro
Abstract:
Context. The Galactic centre is hazardous for stellar clusters because of the strong tidal force. Supposedly, many clusters were destroyed and contributed stars to the crowded stellar field of the bulge and the nuclear stellar cluster. However, it is hard to develop a realistic model to predict the long-term evolution of the complex inner Galaxy, and observing surviving clusters in the central reg…
▽ More
Context. The Galactic centre is hazardous for stellar clusters because of the strong tidal force. Supposedly, many clusters were destroyed and contributed stars to the crowded stellar field of the bulge and the nuclear stellar cluster. However, it is hard to develop a realistic model to predict the long-term evolution of the complex inner Galaxy, and observing surviving clusters in the central region would provide crucial insights into destruction processes. Aims. Among hitherto-known Galactic globular clusters, VVV CL002 is the closest to the centre, 0.4 kpc, but has a very high transverse velocity, 400 km s$^{-1}$. The nature of this cluster and its impact on Galactic astronomy need to be addressed with spectroscopic follow-up. Methods. Here we report the first measurements of its radial velocity and chemical abundance based on near-infrared high-resolution spectroscopy. Results. We found that this cluster has a counterrotating orbit constrained within 1.0\,kpc of the centre, as close as 0.2 kpc at the perigalacticon, confirming that the cluster is not a passerby from the halo but a genuine survivor enduring the harsh conditions of the Galactic mill's tidal forces. In addition, its metallicity and $α$ abundance ([$α$/Fe] $\simeq +0.4$ and [Fe/H]$=-0.54$) are similar to some globular clusters in the bulge. Recent studies suggest that stars with such $α$-enhanced stars were more common at 3 - 6 kpc from the centre around 10 Gyrs ago. Conclusions. We infer that VVV CL002 was formed outside but is currently falling down to the centre, exhibiting a real-time event that must have occurred to many clusters a long time ago.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Faster Maximal Exact Matches with Lazy LCP Evaluation
Authors:
Adrián Goga,
Lore Depuydt,
Nathaniel K. Brown,
Jan Fostier,
Travis Gagie,
Gonzalo Navarro
Abstract:
MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the ope…
▽ More
MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Dynamic Compact Data Structure for Temporal Reachability with Unsorted Contact Insertions
Authors:
Luiz Fernando Afra Brito,
Marcelo Keese Albertini,
Bruno Augusto Nassif Travençolo,
Gonzalo Navarro
Abstract:
Temporal graphs represent interactions between entities over time. Deciding whether entities can reach each other through temporal paths is useful for various applications such as in communication networks and epidemiology. Previous works have studied the scenario in which addition of new interactions can happen at any point in time. A known strategy maintains, incrementally, a Timed Transitive Cl…
▽ More
Temporal graphs represent interactions between entities over time. Deciding whether entities can reach each other through temporal paths is useful for various applications such as in communication networks and epidemiology. Previous works have studied the scenario in which addition of new interactions can happen at any point in time. A known strategy maintains, incrementally, a Timed Transitive Closure by using a dynamic data structure composed of $O(n^2)$ binary search trees containing non-nested time intervals. However, space usage for storing these trees grows rapidly as more interactions are inserted. In this paper, we present a compact data structures that represent each tree as two dynamic bit-vectors. In our experiments, we observed that our data structure improves space usage while having similar time performance for incremental updates when comparing with the previous strategy in temporally dense temporal graphs.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Wheeler maps
Authors:
Andrej Baláz,
Travis Gagie,
Adrián Goga,
Simon Heumos,
Gonzalo Navarro,
Alessia Petescia,
Jouni Sirén
Abstract:
Motivated by challenges in pangenomic read alignment, we propose a generalization of Wheeler graphs that we call Wheeler maps. A Wheeler map stores a text $T[1..n]$ and an assignment of tags to the characters of $T$ such that we can preprocess a pattern $P[1..m]$ and then, given $i$ and $j$, quickly return all the distinct tags labeling the first characters of the occurrences of $P[i..j]$ in $T$.…
▽ More
Motivated by challenges in pangenomic read alignment, we propose a generalization of Wheeler graphs that we call Wheeler maps. A Wheeler map stores a text $T[1..n]$ and an assignment of tags to the characters of $T$ such that we can preprocess a pattern $P[1..m]$ and then, given $i$ and $j$, quickly return all the distinct tags labeling the first characters of the occurrences of $P[i..j]$ in $T$. For the applications that most interest us, characters with long common contexts are likely to have the same tag, so we consider the number $t$ of runs in the list of tags sorted by their characters' positions in the Burrows-Wheeler Transform (BWT) of $T$. We show how, given a straight-line program with $g$ rules for $T$, we can build an $O(g + r + t)$-space Wheeler map, where $r$ is the number of runs in the BWT of $T$, with which we can preprocess a pattern $P[1..m]$ in $O(m \log n)$ time and then return the $k$ distinct tags for $P[i..j]$ in optimal $O(k)$ time for any given $i$ and $j$. We show various further results related to prioritizing the most frequent tags.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Evaluating Regular Path Queries on Compressed Adjacency Matrices
Authors:
Diego Arroyuelo,
Adrián Gómez-Brandón,
Gonzalo Navarro
Abstract:
Regular Path Queries (RPQs), which are essentially regular expressions to be matched against the labels of paths in labeled graphs, are at the core of graph database query languages like SPARQL. A way to solve RPQs is to translate them into a sequence of operations on the adjacency matrices of each label. We design and implement a Boolean algebra on sparse matrix representations and, as an applica…
▽ More
Regular Path Queries (RPQs), which are essentially regular expressions to be matched against the labels of paths in labeled graphs, are at the core of graph database query languages like SPARQL. A way to solve RPQs is to translate them into a sequence of operations on the adjacency matrices of each label. We design and implement a Boolean algebra on sparse matrix representations and, as an application, use them to handle RPQs. Our baseline representation uses the same space as the previously most compact index for RPQs and outperforms it on the hardest types of queries -- those where both RPQ endpoints are unspecified. Our more succinct structure, based on $k^2$-trees, is 4 times smaller than any existing representation that handles RPQs, and still solves complex RPQs in a few seconds. Our new sparse-matrix-based representations dominate a good portion of the space/time tradeoff map, being outperformed only by representations that use much more space. They are also of independent interest beyond solving RPQs.
△ Less
Submitted 23 April, 2024; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Maintaining the cycle structure of dynamic permutations
Authors:
Zsuzsanna Lipták,
Francesco Masillo,
Gonzalo Navarro
Abstract:
We present a new data structure for maintaining dynamic permutations, which we call a $\textit{forest of splay trees (FST)}$. The FST allows one to efficiently maintain the cycle structure of a permutation $π$ when the allowed updates are transpositions. The structure stores one conceptual splay tree for each cycle of $π$, using the position within the cycle as the key. Updating $π$ to $τ\cdotπ$,…
▽ More
We present a new data structure for maintaining dynamic permutations, which we call a $\textit{forest of splay trees (FST)}$. The FST allows one to efficiently maintain the cycle structure of a permutation $π$ when the allowed updates are transpositions. The structure stores one conceptual splay tree for each cycle of $π$, using the position within the cycle as the key. Updating $π$ to $τ\cdotπ$, for a transposition $τ$, takes $\mathcal{O}(\log n)$ amortized time, where $n$ is the size of $π$. The FST computes any $π(i)$, $π^{-1}(i)$, $π^k(i)$ and $π^{-k}(i)$, in $\mathcal{O}(\log n)$ amortized time. Further, it supports cycle-specific queries such as determining whether two elements belong to the same cycle, flip a segment of a cycle, and others, again within $\mathcal{O}(\log n)$ amortized time.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Local systems and Suzuki groups
Authors:
L. Alpoge,
N. M. Katz,
G. Navarro,
E. A. O'Brien,
P. H. Tiep
Abstract:
We study geometric monodromy groups $G_{\geo,\sF_q}$ of the local systems $\sF_q$ on the affine line over $\F_2$ of rank $D=\sqrt{q}(q-1)$, $q=2^{2n+1}$, constructed in \cite{Ka-ERS}. The main result of the paper shows that $G_{\geo,\sF_q}$ is either the Suzuki simple group $\tw2 B_2(q)$, or the special linear group $\SL_D$. We also show that $\sF_8$ has geometric monodromy group $\tw2B_2(8)$, and…
▽ More
We study geometric monodromy groups $G_{\geo,\sF_q}$ of the local systems $\sF_q$ on the affine line over $\F_2$ of rank $D=\sqrt{q}(q-1)$, $q=2^{2n+1}$, constructed in \cite{Ka-ERS}. The main result of the paper shows that $G_{\geo,\sF_q}$ is either the Suzuki simple group $\tw2 B_2(q)$, or the special linear group $\SL_D$. We also show that $\sF_8$ has geometric monodromy group $\tw2B_2(8)$, and arithmetic monodromy group $\Aut(\tw2 B_2(8))$ over $\F_2$, thus establishing \cite[Conjecture 2.2]{Ka-ERS} in full in the case $q=8$.
△ Less
Submitted 11 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
The Field of Values of the Height Zero Characters
Authors:
Gabriel Navarro,
Lucas Ruhstorfer,
Pham Huu Tiep,
Carolina Vallejo
Abstract:
We determine what are the fields of values of the irreducible $p$-height zero characters of all finite groups for $p=2$; we conjecture what they should be for odd primes, and reduce this statement to a problem on blocks of quasi-simple groups.
We determine what are the fields of values of the irreducible $p$-height zero characters of all finite groups for $p=2$; we conjecture what they should be for odd primes, and reduce this statement to a problem on blocks of quasi-simple groups.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Globular Clusters in the Galactic Center Region: expected behavior in the infalling and merger scenario
Authors:
Maria Gabriela Navarro,
Roberto Capuzzo-Dolcetta,
Manuel Arca-Sedda,
Dante Minniti
Abstract:
The infall and merger scenario of massive clusters in the Milky Way's potential well, as one of the Milky Way formation mechanisms, is reexamined to understand how the stars of the merging clusters are redistributed during and after the merger process using, for the first time, simulations with a high resolution concentrated in the 300 pc around the Galactic center. We adopted simulations develope…
▽ More
The infall and merger scenario of massive clusters in the Milky Way's potential well, as one of the Milky Way formation mechanisms, is reexamined to understand how the stars of the merging clusters are redistributed during and after the merger process using, for the first time, simulations with a high resolution concentrated in the 300 pc around the Galactic center. We adopted simulations developed in the framework of the "Modelling the Evolution of Galactic Nuclei" (MEGaN) project. We compared the evolution of representative clusters in the mass and concentration basis in the vicinity of a supermassive black hole. We used the spatial distribution, density profile, and the $50\%$ Lagrange radius (half mass radius) as indicators along the complete simulation to study the evolutionary shape in physical and velocity space and the final fate of these representative clusters. We detect that the least massive clusters are quickly (<10 Myr) destroyed. Instead, the most massive clusters have a long evolution, showing variations in the morphology, especially after each passage close to the supermassive black hole. The deformation of the clusters depends on the concentration, with general deformations for the least concentrated clusters and outer strains for the more concentrated ones. At the end of the simulation, a dense concentration of stars belonging to the clusters is formed. The particles that belong to the most massive and most concentrated clusters are concentrated in the innermost regions, meaning that the most massive and concentrated clusters contribute with a more significant fraction of particles to the final concentration, which suggests that the population of stars of the nuclear star cluster formed through this mechanism comes from massive clusters rather than low-mass globular clusters.
△ Less
Submitted 21 April, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
$p$-groups and zeros of characters
Authors:
Alexander Moretó,
Gabriel Navarro
Abstract:
Fix a prime $p$ and an integer $n\geq 0$. Among the non-linear irreducible characters of the $p$-groups of order $p^n$, what is the minimum number of elements that take the value 0?
Fix a prime $p$ and an integer $n\geq 0$. Among the non-linear irreducible characters of the $p$-groups of order $p^n$, what is the minimum number of elements that take the value 0?
△ Less
Submitted 31 July, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
Space-efficient conversions from SLPs
Authors:
Travis Gagie,
Adrián Goga,
Artur Jeż,
Gonzalo Navarro
Abstract:
We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(δ\log\frac{n}δ)$ from the SLP within $O(g+g_{lc})$ space and in…
▽ More
We give algorithms that, given a straight-line program (SLP) with $g$ rules that generates (only) a text $T [1..n]$, builds within $O(g)$ space the Lempel-Ziv (LZ) parse of $T$ (of $z$ phrases) in time $O(n\log^2 n)$ or in time $O(gz\log^2(n/z))$. We also show how to build a locally consistent grammar (LCG) of optimal size $g_{lc} = O(δ\log\frac{n}δ)$ from the SLP within $O(g+g_{lc})$ space and in $O(n\log g)$ time, where $δ$ is the substring complexity measure of $T$. Finally, we show how to build the LZ parse of $T$ from such a LCG within $O(g_{lc})$ space and in time $O(z\log^2 n \log^2(n/z))$. All our results hold with high probability.
△ Less
Submitted 10 October, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
O-type Stars Stellar Parameter Estimation Using Recurrent Neural Networks
Authors:
Miguel Flores R.,
Luis J. Corral,
Celia R. Fierro-Santillán,
Silvana G. Navarro
Abstract:
In this paper, we present a deep learning system approach to estimating luminosity, effective temperature, and surface gravity of O-type stars using the optical region of the stellar spectra. In previous work, we compare a set of machine learning and deep learning algorithms in order to establish a reliable way to fit a stellar model using two methods: the classification of the stellar spectra mod…
▽ More
In this paper, we present a deep learning system approach to estimating luminosity, effective temperature, and surface gravity of O-type stars using the optical region of the stellar spectra. In previous work, we compare a set of machine learning and deep learning algorithms in order to establish a reliable way to fit a stellar model using two methods: the classification of the stellar spectra models and the estimation of the physical parameters in a regression-type task. Here we present the process to estimate individual physical parameters from an artificial neural network perspective with the capacity to handle stellar spectra with a low signal-to-noise ratio (S/N), in the $<$20 S/N boundaries. The development of three different recurrent neural network systems, the training process using stellar spectra models, the test over nine different observed stellar spectra, and the comparison with estimations in previous works are presented. Additionally, characterization methods for stellar spectra in order to reduce the dimensionality of the input data for the system and optimize the computational resources are discussed.
△ Less
Submitted 27 October, 2022; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Computing MEMs and Relatives on Repetitive Text Collections
Authors:
Gonzalo Navarro
Abstract:
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1 .. m]$ on a large repetitive text collection $T[1 .. n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m^2 \log^εn)$, for any constant $ε> 0$, on a data structure of size $O(g_{rl})$. Further, on a lo…
▽ More
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern $P[1 .. m]$ on a large repetitive text collection $T[1 .. n]$, which is represented as a (hopefully much smaller) run-length context-free grammar of size $g_{rl}$. We show that the problem can be solved in time $O(m^2 \log^εn)$, for any constant $ε> 0$, on a data structure of size $O(g_{rl})$. Further, on a locally consistent grammar of size $O(δ\log\frac{n}δ)$, the time decreases to $O(m\log m(\log m + \log^εn))$. The value $δ$ is a function of the substring complexity of $T$ and $Ω(δ\log\frac{n}δ)$ is a tight lower bound on the compressibility of repetitive texts $T$, so our structure has optimal size in terms of $n$ and $δ$. We extend our results to several related problems, such as finding $k$-MEMs, MUMs, rare MEMs, and applications.
△ Less
Submitted 4 September, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Height Zero Conjecture with Galois Automorphisms
Authors:
Gunter Malle,
Gabriel Navarro
Abstract:
We prove a strengthening of Brauer's height zero conjecture for principal 2-blocks with Galois automorphisms. This requires a new extension of the Itô--Michler theorem for the prime~2, again with Galois automorphisms. We close, this time for odd primes $p$, with a new characterisation of $p$-closed groups via the decomposition numbers of certain characters.
We prove a strengthening of Brauer's height zero conjecture for principal 2-blocks with Galois automorphisms. This requires a new extension of the Itô--Michler theorem for the prime~2, again with Galois automorphisms. We close, this time for odd primes $p$, with a new characterisation of $p$-closed groups via the decomposition numbers of certain characters.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Brauer's Height Zero Conjecture
Authors:
Gunter Malle,
Gabriel Navarro,
Amanda A. Schaeffer Fry,
Pham Huu Tiep
Abstract:
We complete the proof of Brauer's Height Zero Conjecture from 1955 by establishing the open implication for all odd primes.
We complete the proof of Brauer's Height Zero Conjecture from 1955 by establishing the open implication for all odd primes.
△ Less
Submitted 2 May, 2024; v1 submitted 10 September, 2022;
originally announced September 2022.
-
Skew differential Goppa codes and their application to McEliece cryptosystem
Authors:
José Gómez-Torrecillas,
F. J. Lobillo,
Gabriel Navarro
Abstract:
A class of linear codes that extends classic Goppa codes to a non-commutative context is defined. An efficient decoding algorithm, based on the solution of a non-commutative key equation, is designed. We show how the parameters of these codes, when the alphabet is a finite field, may be adjusted to propose a McEliece-type cryptosystem.
A class of linear codes that extends classic Goppa codes to a non-commutative context is defined. An efficient decoding algorithm, based on the solution of a non-commutative key equation, is designed. We show how the parameters of these codes, when the alphabet is a finite field, may be adjusted to propose a McEliece-type cryptosystem.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Balancing Run-Length Straight-Line Programs*
Authors:
Gonzalo Navarro,
Francisco Olivares,
Cristian Urbina
Abstract:
It was recently proved that any SLP generating a given string $w$ can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with run-length rules of the form $A \rightarrow B^t$ for $t>2$, deriving $\texttt{exp}(A) = \texttt{exp}(B)^t$. An immediate consequence is the simplification of the…
▽ More
It was recently proved that any SLP generating a given string $w$ can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We show that this result also holds for RLSLPs, which are SLPs extended with run-length rules of the form $A \rightarrow B^t$ for $t>2$, deriving $\texttt{exp}(A) = \texttt{exp}(B)^t$. An immediate consequence is the simplification of the algorithm for extracting substrings of an RLSLP-compressed string. We also show that several problems like answering RMQs and computing Karp-Rabin fingerprints on substrings can be solved in $\mathcal{O}(g_{rl})$ space and $\mathcal{O}(\log n)$ time, $g_{rl}$ being the size of the smallest RLSLP generating the string, of length $n$. We extend the result to solving more general operations on string ranges, in $\mathcal{O}(g_{rl})$ space and $\mathcal{O}(\log n)$ applications of the operation. In general, the smallest RLSLP can be asymptotically smaller than the smallest SLP by up to an $\mathcal{O}(\log n)$ factor, so our results can make a difference in terms of the space needed for computing these operations efficiently for some string families.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
L-systems for Measuring Repetitiveness*
Authors:
Gonzalo Navarro,
Cristian Urbina
Abstract:
An L-system (for lossless compression) is a CPD0L-system extended with two parameters $d$ and $n$, which determines unambiguously a string $w = τ(\varphi^d(s))[1:n]$, where $\varphi$ is the morphism of the system, $s$ is its axiom, and $τ$ is its coding. The length of the shortest description of an L-system generating $w$ is known as $\ell$, and is arguably a relevant measure of repetitiveness tha…
▽ More
An L-system (for lossless compression) is a CPD0L-system extended with two parameters $d$ and $n$, which determines unambiguously a string $w = τ(\varphi^d(s))[1:n]$, where $\varphi$ is the morphism of the system, $s$ is its axiom, and $τ$ is its coding. The length of the shortest description of an L-system generating $w$ is known as $\ell$, and is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence.
In this paper we deepen the study of the measure $\ell$ and its relation with $δ$, a better established lower bound that builds on substring complexity. Our results show that $\ell$ and $δ$ are largely orthogonal, in the sense that one can be much larger than the other depending on the case. This suggests that both sources of repetitiveness are mostly unrelated. We also show that the recently introduced NU-systems, which combine the capabilities of L-systems with bidirectional macro-schemes, can be asymptotically strictly smaller than both mechanisms, which makes the size $ν$ of the smallest NU-system the unique smallest reachable repetitiveness measure to date.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Near-Optimal Search Time in $δ$-Optimal Space, and Vice Versa
Authors:
Tomasz Kociumaka,
Gonzalo Navarro,
Francisco Olivares
Abstract:
Two recent lower bounds on the compressibility of repetitive sequences, $δ\le γ$, have received much attention. It has been shown that a length-$n$ string $S$ over an alphabet of size $σ$ can be represented within the optimal $O(δ\log\tfrac{n\log σ}{δ\log n})$ space, and further, that within that space one can find all the $occ$ occurrences in $S$ of any length-$m$ pattern in time…
▽ More
Two recent lower bounds on the compressibility of repetitive sequences, $δ\le γ$, have received much attention. It has been shown that a length-$n$ string $S$ over an alphabet of size $σ$ can be represented within the optimal $O(δ\log\tfrac{n\log σ}{δ\log n})$ space, and further, that within that space one can find all the $occ$ occurrences in $S$ of any length-$m$ pattern in time $O(m\log n + occ \log^εn)$ for any constant $ε>0$. Instead, the near-optimal search time $O(m+({occ+1})\log^εn)$ has been achieved only within $O(γ\log\frac{n}γ)$ space. Both results are based on considerably different locally consistent parsing techniques. The question of whether the better search time could be supported within the $δ$-optimal space remained open. In this paper, we prove that both techniques can indeed be combined to obtain the best of both worlds: $O(m+({occ+1})\log^εn)$ search time within $O(δ\log\tfrac{n\log σ}{δ\log n})$ space. Moreover, the number of occurrences can be computed in $O(m+\log^{2+ε}n)$ time within $O(δ\log\tfrac{n\log σ}{δ\log n})$ space. We also show that an extra sublogarithmic factor on top of this space enables optimal $O(m+occ)$ search time, whereas an extra logarithmic factor enables optimal $O(m)$ counting time.
△ Less
Submitted 15 September, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Efficient Construction of the BWT for Repetitive Text Using String Compression
Authors:
Diego Díaz-Domínguez,
Gonzalo Navarro
Abstract:
We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is massive and repetitive. Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results…
▽ More
We present a new semi-external algorithm that builds the Burrows--Wheeler transform variant of Bauer et al. (a.k.a., BCR BWT) in linear expected time. Our method uses compression techniques to reduce computational costs when the input is massive and repetitive. Concretely, we build on induced suffix sorting (ISS) and resort to run-length and grammar compression to maintain our intermediate results in compact form. Our compression format not only saves space but also speeds up the required computations. Our experiments show important space and computation time savings when the text is repetitive. In moderate-size collections of real human genome assemblies (14.2 GB - 75.05 GB), our memory peak is, on average, 1.7x smaller than the peak of the state-of-the-art BCR BWT construction algorithm (\texttt{ropebwt2}), while running 5x faster. Our current implementation was also able to compute the BCR BWT of 400 real human genome assemblies (1.2 TB) in 41.21 hours using 118.83 GB of working memory (around 10\% of the input size). Interestingly, the results we report in the 1.2 TB file are dominated by the difficulties of scanning huge files under memory constraints (specifically, I/O operations). This fact indicates we can perform much better with a more careful implementation of our method, thus scaling to even bigger sizes efficiently.
△ Less
Submitted 14 August, 2023; v1 submitted 12 April, 2022;
originally announced April 2022.
-
Characters, Commutators and Centers of Sylow Subgroups
Authors:
Gabriel Navarro,
Benjamin Sambale
Abstract:
The character table of a finite group G determines whether |P:P'|=p^2 and whether |P:Z(P)|=p^2, where P is a Sylow p-subgroup of G. To prove the latter, we give a detailed classification of those groups in terms of the generalized Fitting subgroup.
The character table of a finite group G determines whether |P:P'|=p^2 and whether |P:Z(P)|=p^2, where P is a Sylow p-subgroup of G. To prove the latter, we give a detailed classification of those groups in terms of the generalized Fitting subgroup.
△ Less
Submitted 20 March, 2023; v1 submitted 9 April, 2022;
originally announced April 2022.
-
Principal blocks for different primes, II
Authors:
Gabriel Navarro,
Noelia Rizo,
A. A. Schaeffer Fry
Abstract:
If G is a finite group, we have proposed new conjectures on the interaction between different primes and their corresponding Brauer principal blocks. In this paper, we give strong support to the validity of these conjectures.
If G is a finite group, we have proposed new conjectures on the interaction between different primes and their corresponding Brauer principal blocks. In this paper, we give strong support to the validity of these conjectures.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Principal blocks for different primes, I
Authors:
Gabriel Navarro,
Noelia Rizo,
A. A. Schaeffer Fry
Abstract:
We propose new conjectures about the relationship between the principal blocks of finite groups for different primes and establish evidence for these conjectures.
We propose new conjectures about the relationship between the principal blocks of finite groups for different primes and establish evidence for these conjectures.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices
Authors:
Paolo Ferragina,
Travis Gagie,
Dominik Köppl,
Giovanni Manzini,
Gonzalo Navarro,
Manuel Striani,
Francesco Tosoni
Abstract:
As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compression scheme for real-valued matrices which achieves efficient performance in terms of compression ratio and time for linear-algebra operations. Experiments sho…
▽ More
As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compression scheme for real-valued matrices which achieves efficient performance in terms of compression ratio and time for linear-algebra operations. Experiments show that, as a compressor, our tool is clearly superior to gzip and it is usually within 20% of xz in terms of compression ratio. In addition, our compressed format supports matrix-vector multiplications in time and space proportional to the size of the compressed representation, unlike gzip and xz that require the full decompression of the compressed matrix. To our knowledge our lossless compressor is the first one achieving time and space complexities which match the theoretical limit expressed by the $k$-th order statistical entropy of the input.
To achieve further time/space reductions, we propose column-reordering algorithms hinging on a novel column-similarity score. Our experiments on various data sets of ML matrices show that, with a modest preprocessing time, our column reordering can yield a further reduction of up to 16% in the peak memory usage during matrix-vector multiplication.
Finally, we compare our proposal against the state-of-the-art Compressed Linear Algebra (CLA) approach showing that ours runs always at least twice faster (in a multi-thread setting) and achieves better compressed space occupancy for most of the tested data sets. This experimentally confirms the provably effective theoretical bounds we show for our compressed-matrix approach.
△ Less
Submitted 30 March, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
On the commuting probability of p-elements in a finite group
Authors:
Timothy C. Burness,
Robert M. Guralnick,
Alexander Moretó,
Gabriel Navarro
Abstract:
Let $G$ be a finite group, let $p$ be a prime and let ${\rm Pr}_p(G)$ be the probability that two random $p$-elements of $G$ commute. In this paper we prove that ${\rm Pr}_p(G) > (p^2+p-1)/p^3$ if and only if $G$ has a normal and abelian Sylow $p$-subgroup, which generalizes previous results on the widely studied commuting probability of a finite group. This bound is best possible in the sense tha…
▽ More
Let $G$ be a finite group, let $p$ be a prime and let ${\rm Pr}_p(G)$ be the probability that two random $p$-elements of $G$ commute. In this paper we prove that ${\rm Pr}_p(G) > (p^2+p-1)/p^3$ if and only if $G$ has a normal and abelian Sylow $p$-subgroup, which generalizes previous results on the widely studied commuting probability of a finite group. This bound is best possible in the sense that for each prime $p$ there are groups with ${\rm Pr}_p(G) = (p^2+p-1)/p^3$ and we classify all such groups. Our proof is based on bounding the proportion of $p$-elements in $G$ that commute with a fixed $p$-element in $G \setminus \textbf{O}_p(G)$, which in turn relies on recent work of the first two authors on fixed point ratios for finite primitive permutation groups.
△ Less
Submitted 6 July, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Time- and Space-Efficient Regular Path Queries on Graphs
Authors:
Diego Arroyuelo,
Aidan Hogan,
Gonzalo Navarro,
Javiel Rojas-Ledesma
Abstract:
We introduce a time- and space-efficient technique to solve regularpath queries over labeled graphs. We combine a bit-parallel simula-tion of the Glushkov automaton of the regular expression with thering index introduced by Arroyuelo et al., exploiting its wavelettree representation of the triples in order to efficiently reach thestates of the product graph that are relevant for the query. Ourquer…
▽ More
We introduce a time- and space-efficient technique to solve regularpath queries over labeled graphs. We combine a bit-parallel simula-tion of the Glushkov automaton of the regular expression with thering index introduced by Arroyuelo et al., exploiting its wavelettree representation of the triples in order to efficiently reach thestates of the product graph that are relevant for the query. Ourquery algorithm is able to simultaneously process several automa-ton states, as well as several graph nodes/labels. Our experimentalresults show that our representation uses 3-5 times less space thanthe alternatives in the literature, while generally outperformingthem in query times (1.67 times faster than the next best).
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
HOLZ: High-Order Entropy Encoding of Lempel-Ziv Factor Distances
Authors:
Dominik Köppl,
Gonzalo Navarro,
Nicola Prezza
Abstract:
We propose a new representation of the offsets of the Lempel-Ziv (LZ) factorization based on the co-lexicographic order of the processed prefixes. The selected offsets tend to approach the k-th order empirical entropy. Our evaluations show that this choice of offsets is superior to the rightmost LZ parsing and the bit-optimal LZ parsing on datasets with small high-order entropy.
We propose a new representation of the offsets of the Lempel-Ziv (LZ) factorization based on the co-lexicographic order of the processed prefixes. The selected offsets tend to approach the k-th order empirical entropy. Our evaluations show that this choice of offsets is superior to the rightmost LZ parsing and the bit-optimal LZ parsing on datasets with small high-order entropy.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
MillenniumDB: A Persistent, Open-Source, Graph Database
Authors:
Domagoj Vrgoc,
Carlos Rojas,
Renzo Angles,
Marcelo Arenas,
Diego Arroyuelo,
Carlos Buil Aranda,
Aidan Hogan,
Gonzalo Navarro,
Cristian Riveros,
Juan Romero
Abstract:
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data manage…
▽ More
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Electronic and Thermal Properties of $\text{GeTe/Sb}_{2}\text{Te}_{3}$ Superlattices by ab-initio Approach: Impact of Van der Waals Gaps on Vertical Lattice Thermal Conductivity
Authors:
Benoît Sklénard,
François Triozon,
Chiara Sabbione,
Lavinia Nistor,
Michel Frei,
Gabriele Navarro,
**g Li
Abstract:
In the last decade, several works have focused on exploring the material and electrical properties of $\text{GeTe/Sb}_{2}\text{Te}_{3}$ superlattices (SLs) in particular because of some first device implementations demonstrating interesting performances such as fast switching speed, low energy consumption, and non-volatility. However, the switching mechanism in such SL-based devices remains under…
▽ More
In the last decade, several works have focused on exploring the material and electrical properties of $\text{GeTe/Sb}_{2}\text{Te}_{3}$ superlattices (SLs) in particular because of some first device implementations demonstrating interesting performances such as fast switching speed, low energy consumption, and non-volatility. However, the switching mechanism in such SL-based devices remains under debate. In this work, we investigate the prototype $\text{GeTe/Sb}_{2}\text{Te}_{3}$ SLs, to analyze fundamentally their electronic and thermal properties by ab initio methods. We find that the resistive contrast is small among the different phases of $\text{GeTe/Sb}_{2}\text{Te}_{3}$ because of a small electronic gap (about 0.1 eV) and a consequent semi-metallic-like behavior. At the same time the out-of-plane lattice thermal conductivity is rather small, while varying up to four times among the different phases, from 0.11 to 0.45 W/m$^{-1}$K$^{-1}$, intimately related to the number of Van der Waals (VdW) gaps in a unit block. Such findings confirm the importance of the thermal improvement achievable in $\text{GeTe/Sb}_{2}\text{Te}_{3}$ super-lattices devices, highlighting the impact of the material stacking and the role of VdW gaps on the thermal engineering of the Phase-Change Memory cell.
△ Less
Submitted 2 November, 2021; v1 submitted 30 September, 2021;
originally announced September 2021.
-
Alternating sums over pi-subgroups
Authors:
Gabriel Navarro,
Benjamin Sambale
Abstract:
Dade's conjecture predicts that if p is a prime, then the number of irreducible characters of a finite group of a given p-defect is determined by local subgroups. In this paper we replace $p$ by a set of primes pi and prove a pi-version of Dade's conjecture for pi-separable groups. This extends the (known) p-solvable case of the original conjecture and relates to a pi-version of Alperin's weight c…
▽ More
Dade's conjecture predicts that if p is a prime, then the number of irreducible characters of a finite group of a given p-defect is determined by local subgroups. In this paper we replace $p$ by a set of primes pi and prove a pi-version of Dade's conjecture for pi-separable groups. This extends the (known) p-solvable case of the original conjecture and relates to a pi-version of Alperin's weight conjecture previously established by the authors.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
On Stricter Reachable Repetitiveness Measures*
Authors:
Gonzalo Navarro,
Cristian Urbina
Abstract:
The size $b$ of the smallest bidirectional macro scheme, which is arguably the most general copy-paste scheme to generate a given sequence, is considered to be the strictest reachable measure of repetitiveness. It is strictly lower-bounded by measures like $γ$ and $δ$, which are known or believed to be unreachable and to capture the entropy of repetitiveness. In this paper we study another sequenc…
▽ More
The size $b$ of the smallest bidirectional macro scheme, which is arguably the most general copy-paste scheme to generate a given sequence, is considered to be the strictest reachable measure of repetitiveness. It is strictly lower-bounded by measures like $γ$ and $δ$, which are known or believed to be unreachable and to capture the entropy of repetitiveness. In this paper we study another sequence generation mechanism, namely compositions of a morphism. We show that these form another plausible mechanism to characterize repetitive sequences and define NU-systems, which combine such a mechanism with macro schemes. We show that the size $ν\leq b$ of the smallest NU-system is reachable and can be $o(δ)$ for some string families, thereby implying that the limit of compressibility of repetitive sequences can be even smaller than previously thought. We also derive several other results characterizing $ν$.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.
-
Unveiling short period binaries in the inner VVV bulge
Authors:
E. Botan,
R. K. Saito,
D. Minniti,
A. Kanaan,
R. Contreras Ramos,
T. S. Ferreira,
L. V. Gramajo,
M. G. Navarro
Abstract:
Most of our knowledge about the structure of the Milky Way has come from the study of variable stars. Among the variables, mimicking the periodic variation of pulsating stars, are the eclipsing binaries. These stars are important in astrophysics because they allow us to directly measure radii and masses of the components, as well as the distance to the system, thus being useful in studies of Galac…
▽ More
Most of our knowledge about the structure of the Milky Way has come from the study of variable stars. Among the variables, mimicking the periodic variation of pulsating stars, are the eclipsing binaries. These stars are important in astrophysics because they allow us to directly measure radii and masses of the components, as well as the distance to the system, thus being useful in studies of Galactic structure alongside pulsating RR Lyrae and Cepheids. Using the distinguishing features of their light curves, one can identify them using a semi-automated process. In this work, we present a strategy to search for eclipsing variables in the inner VVV bulge across an area of 13.4 sq. deg. within $1.68^{\rm o}<l<7.53^{\rm o}$ and $-3.73^{\rm o}<b<-1.44^{\rm o}$, corresponding to the VVV tiles b293 to b296 and b307 to b310. We accurately classify 212 previously unknown eclipsing binaries, including six very reddened sources. The preliminary analysis suggests these eclipsing binaries are located in the most obscured regions of the foreground disk and bulge of the Galaxy. This search is therefore complementary to other variable stars searches carried out at optical wavelengths.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
A Fast and Small Subsampled R-index
Authors:
Dustin Cobas,
Travis Gagie,
Gonzalo Navarro
Abstract:
The $r$-index (Gagie et al., JACM 2020) represented a breakthrough in compressed indexing of repetitive text collections, outperforming its alternatives by orders of magnitude. Its space usage, $\mathcal{O}(r)$ where $r$ is the number of runs in the Burrows-Wheeler Transform of the text, is however larger than Lempel-Ziv and grammar-based indexes, and makes it uninteresting in various real-life sc…
▽ More
The $r$-index (Gagie et al., JACM 2020) represented a breakthrough in compressed indexing of repetitive text collections, outperforming its alternatives by orders of magnitude. Its space usage, $\mathcal{O}(r)$ where $r$ is the number of runs in the Burrows-Wheeler Transform of the text, is however larger than Lempel-Ziv and grammar-based indexes, and makes it uninteresting in various real-life scenarios of milder repetitiveness. In this paper we introduce the $sr$-index, a variant that limits the space to $\mathcal{O}(\min(r,n/s))$ for a text of length $n$ and a given parameter $s$, at the expense of multiplying by $s$ the time per occurrence reported. The $sr$-index is obtained by carefully subsampling the text positions indexed by the $r$-index, in a way that we prove is still able to support pattern matching with guaranteed performance. Our experiments demonstrate that the $sr$-index sharply outperforms virtually every other compressed index on repetitive texts, both in time and space, even matching the performance of the $r$-index while using 1.5--3.0 times less space. Only some Lempel-Ziv-based indexes achieve better compression than the $sr$-index, using about half the space, but they are an order of magnitude slower.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Brauer's Height Zero Conjecture for Principal Blocks
Authors:
Gunter Malle,
Gabriel Navarro
Abstract:
We prove \emph{the other half} of Brauer's Height Zero Conjecture in the case of principal blocks.
We prove \emph{the other half} of Brauer's Height Zero Conjecture in the case of principal blocks.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
PCM-trace: Scalable Synaptic Eligibility Traces with Resistivity Drift of Phase-Change Materials
Authors:
Yigit Demirag,
Filippo Moro,
Thomas Dalgaty,
Gabriele Navarro,
Charlotte Frenkel,
Giacomo Indiveri,
Elisa Vianello,
Melika Payvand
Abstract:
Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip lear…
▽ More
Dedicated hardware implementations of spiking neural networks that combine the advantages of mixed-signal neuromorphic circuits with those of emerging memory technologies have the potential of enabling ultra-low power pervasive sensory processing. To endow these systems with additional flexibility and the ability to learn to solve specific tasks, it is important to develop appropriate on-chip learning mechanisms.Recently, a new class of three-factor spike-based learning rules have been proposed that can solve the temporal credit assignment problem and approximate the error back-propagation algorithm on complex tasks. However, the efficient implementation of these rules on hybrid CMOS/memristive architectures is still an open challenge. Here we present a new neuromorphic building block,called PCM-trace, which exploits the drift behavior of phase-change materials to implement long lasting eligibility traces, a critical ingredient of three-factor learning rules. We demonstrate how the proposed approach improves the area efficiency by >10X compared to existing solutions and demonstrates a techno-logically plausible learning algorithm supported by experimental data from device measurements
△ Less
Submitted 16 February, 2021; v1 submitted 14 February, 2021;
originally announced February 2021.
-
Efficient construction of the extended BWT from grammar-compressed DNA sequencing reads
Authors:
Diego Diaz-Dominguez annd Gonzalo Navarro
Abstract:
We present an algorithm for building the extended BWT (eBWT) of a string collection from its grammar-compressed representation. Our technique exploits the string repetitions captured by the grammar to boost the computation of the eBWT. Thus, the more repetitive the collection is, the lower are the resources we use per input symbol. We rely on a new grammar recently proposed at DCC'21 whose nonterm…
▽ More
We present an algorithm for building the extended BWT (eBWT) of a string collection from its grammar-compressed representation. Our technique exploits the string repetitions captured by the grammar to boost the computation of the eBWT. Thus, the more repetitive the collection is, the lower are the resources we use per input symbol. We rely on a new grammar recently proposed at DCC'21 whose nonterminals serve as building blocks for inducing the eBWT. A relevant application for this idea is the construction of self-indexes for analyzing sequencing reads -- massive and repetitive string collections of raw genomic data. Self-indexes have become increasingly popular in Bioinformatics as they can encode more information in less space. Our efficient eBWT construction opens the door to perform accurate bioinformatic analyses on more massive sequence datasets, which are not tractable with current eBWT construction techniques.
△ Less
Submitted 7 February, 2021;
originally announced February 2021.
-
Grammar Compression By Induced Suffix Sorting
Authors:
Daniel S. N. Nunes,
Felipe A. Louza,
Simon Gog,
Mauricio Ayala-Rincón,
Gonzalo Navarro
Abstract:
A grammar compression algorithm, called GCIS, is introduced in this work. GCIS is based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. The proposed solution builds on the factorization performed by SAIS during suffix sorting. A context-free grammar is used to replace factors by non-terminals. The algorithm is then recursively applied on the shorter sequence of non-…
▽ More
A grammar compression algorithm, called GCIS, is introduced in this work. GCIS is based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. The proposed solution builds on the factorization performed by SAIS during suffix sorting. A context-free grammar is used to replace factors by non-terminals. The algorithm is then recursively applied on the shorter sequence of non-terminals. The resulting grammar is encoded by exploiting some redundancies, such as common prefixes between right-hands of rules, sorted according to SAIS. GCIS excels for its low space and time required for compression while obtaining competitive compression ratios. Our experiments on regular and repetitive, moderate and very large texts, show that GCIS stands as a very convenient choice compared to well-known compressors such as Gzip, 7-Zip, and RePair, the gold standard in grammar compression. In exchange, GCIS is slow at decompressing. Yet, grammar compressors are more convenient than Lempel-Ziv compressors in that one can access text substrings directly in compressed form, without ever decompressing the text. We demonstrate that GCIS is an excellent candidate for this scenario because it shows to be competitive among its RePair based alternatives. We also show, how GCIS relation with SAIS makes it a good intermediate structure to build the suffix array and the LCP array during decompression of the text.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.