-
Untangling the Unrestricted Web: Automatic Identification of Multilingual Registers
Authors:
Erik Henriksson,
Amanda Myntti,
Anni Eskelinen,
Selcen Erten-Johansson,
Saara Hellström,
Veronika Laippala
Abstract:
This article explores deep learning models for the automatic identification of registers - text varieties such as news reports and discussion forums - in web-based datasets across 16 languages. Web register (or genre) identification would provide a robust solution for understanding the content of web-scale datasets, which have become crucial in computational linguistics. Despite recent advances, t…
▽ More
This article explores deep learning models for the automatic identification of registers - text varieties such as news reports and discussion forums - in web-based datasets across 16 languages. Web register (or genre) identification would provide a robust solution for understanding the content of web-scale datasets, which have become crucial in computational linguistics. Despite recent advances, the potential of register classifiers on the noisy web remains largely unexplored, particularly in multilingual settings and when targeting the entire unrestricted web. We experiment with a range of deep learning models using the new Multilingual CORE corpora, which includes 16 languages annotated using a detailed, hierarchical taxonomy of 25 registers designed to cover the entire unrestricted web. Our models achieve state-of-the-art results, showing that a detailed taxonomy in a hierarchical multi-label setting can yield competitive classification performance. However, all models hit a glass ceiling at approximately 80% F1 score, which we attribute to the non-discrete nature of web registers and the inherent uncertainty in labeling some documents. By pruning ambiguous examples, we improve model performance to over 90%. Finally, multilingual models outperform monolingual ones, particularly benefiting languages with fewer training examples and smaller registers. Although a zero-shot setting decreases performance by an average of 7%, these drops are not linked to specific registers or languages. Instead, registers show surprising similarity across languages.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short
Authors:
Maciej Besta,
Marcel Schneider,
Karolina Cynk,
Marek Konieczny,
Erik Henriksson,
Salvatore Di Girolamo,
Ankit Singla,
Torsten Hoefler
Abstract:
We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC supercomputers as well as cloud data centers and clusters. FatPaths exposes and exploits the rich ("fat") diversity of both minimal and non-minimal paths for high-performan…
▽ More
We introduce FatPaths: a simple, generic, and robust routing architecture that enables state-of-the-art low-diameter topologies such as Slim Fly to achieve unprecedented performance. FatPaths targets Ethernet stacks in both HPC supercomputers as well as cloud data centers and clusters. FatPaths exposes and exploits the rich ("fat") diversity of both minimal and non-minimal paths for high-performance multi-pathing. Moreover, FatPaths uses a redesigned "purified" transport layer that removes virtually all TCP performance issues (e.g., the slow start), and incorporates flowlet switching, a technique used to prevent packet reordering in TCP networks, to enable very simple and effective load balancing. Our design enables recent low-diameter topologies to outperform powerful Clos designs, achieving 15% higher net throughput at 2x lower latency for comparable cost. FatPaths will significantly accelerate Ethernet clusters that form more than 50% of the Top500 list and it may become a standard routing scheme for modern topologies.
△ Less
Submitted 11 November, 2020; v1 submitted 26 June, 2019;
originally announced June 2019.
-
Multiple Loop Self-Triggered Model Predictive Control for Network Scheduling and Control
Authors:
Erik Henriksson,
Daniel E. Quevedo,
Edwin G. W. Peters,
Henrik Sandberg,
Karl Henrik Johansson
Abstract:
We present an algorithm for controlling and scheduling multiple linear time-invariant processes on a shared bandwidth limited communication network using adaptive sampling intervals. The controller is centralized and computes at every sampling instant not only the new control command for a process, but also decides the time interval to wait until taking the next sample. The approach relies on mode…
▽ More
We present an algorithm for controlling and scheduling multiple linear time-invariant processes on a shared bandwidth limited communication network using adaptive sampling intervals. The controller is centralized and computes at every sampling instant not only the new control command for a process, but also decides the time interval to wait until taking the next sample. The approach relies on model predictive control ideas, where the cost function penalizes the state and control effort as well as the time interval until the next sample is taken. The latter is introduced in order to generate an adaptive sampling scheme for the overall system such that the sampling time increases as the norm of the system state goes to zero. The paper presents a method for synthesizing such a predictive controller and gives explicit sufficient conditions for when it is stabilizing. Further explicit conditions are given which guarantee conflict free transmissions on the network. It is shown that the optimization problem may be solved off-line and that the controller can be implemented as a lookup table of state feedback gains. Simulation studies which compare the proposed algorithm to periodic sampling illustrate potential performance gains.
△ Less
Submitted 10 February, 2015;
originally announced February 2015.
-
Pion Decay Widths of D mesons
Authors:
K. O. E. Henriksson,
T. A. Lahde,
C. J. Nyfalt,
D. O. Riska
Abstract:
The pionic decay rates of the excited $L=0,1$ $D$ mesons are calculated with a Hamiltonian model within the framework of the covariant Blankenbecler-Sugar {equation.} The interaction between the light quark and charm antiquark is described by a linear scalar confining and a screened one-gluon exchange interaction. The decay widths of the $D^*$ mesons obtain a contribution from the exchange curre…
▽ More
The pionic decay rates of the excited $L=0,1$ $D$ mesons are calculated with a Hamiltonian model within the framework of the covariant Blankenbecler-Sugar {equation.} The interaction between the light quark and charm antiquark is described by a linear scalar confining and a screened one-gluon exchange interaction. The decay widths of the $D^*$ mesons obtain a contribution from the exchange current that is associated with the linear scalar confining interaction. If this contribution is taken into account along with the single quark approximation, the calculated decay rates of the charged $D^*$ mesons are readily below the current empirical upper limits if the axial coupling constant of the light constituent quarks is taken to be $g_A^q$ = 0.87, but reach the empirical upper limits if $g_A^q$ = 1. With the conventional values for $g_A^q$, the calculated widths of the $D_1$ and $D_2^*$ mesons fall somewhat below the experimental lower limits, leaving room for other decay modes as well, such as $ππ$ decay. The unrealistically large contribution from the axial charge operator to the calculated pion decay width of the $D_1$ meson is suppressed by taking into account the exchange charge effects that are associated with the scalar linear confining and vector one-gluon exchange interactions. The predicted values for the pionic widths of the hitherto undiscovered L=1 $D_1^*$ and $D_0^*$ mesons are found to be smaller than previous estimates.
△ Less
Submitted 9 November, 2000; v1 submitted 8 September, 2000;
originally announced September 2000.