-
The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings
Authors:
Gili Goldin,
Nick Howell,
Noam Ordan,
Ella Rabinovich,
Shuly Wintner
Abstract:
We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of t…
▽ More
We present the Knesset Corpus, a corpus of Hebrew parliamentary proceedings containing over 30 million sentences (over 384 million tokens) from all the (plenary and committee) protocols held in the Israeli parliament between 1998 and 2022. Sentences are annotated with morpho-syntactic information and are associated with detailed meta-information reflecting demographic and political properties of the speakers, based on a large database of parliament members and factions that we compiled. We discuss the structure and composition of the corpus and the various processing steps we applied to it. To demonstrate the utility of this novel dataset we present two use cases. We show that the corpus can be used to examine historical developments in the style of political discussions by showing a reduction in lexical richness in the proceedings over time. We also investigate some differences between the styles of men and women speakers. These use cases exemplify the potential of the corpus to shed light on important trends in the Israeli society, supporting research in linguistics, political science, communication, law, etc.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
PlayFutures: Imagining Civic Futures with AI and Puppets
Authors:
Supratim Pait,
Sumita Sharma,
Ashley Frith,
Michael Nitsche,
Noura Howell
Abstract:
Children are the builders of the future and crucial to how the technologies around us develop. They are not voters but are participants in how the public spaces in a city are used. Through a workshop designed around kids of age 9-12, we investigate if novel technologies like artificial intelligence can be integrated in existing ways of play and performance to 1) re-imagine the future of civic spac…
▽ More
Children are the builders of the future and crucial to how the technologies around us develop. They are not voters but are participants in how the public spaces in a city are used. Through a workshop designed around kids of age 9-12, we investigate if novel technologies like artificial intelligence can be integrated in existing ways of play and performance to 1) re-imagine the future of civic spaces, 2) reflect on these novel technologies in the process and 3) build ways of civic engagement through play. We do this using a blend AI image generation and Puppet making to ultimately build future scenarios, perform debate and discussion around the futures and reflect on AI, its role and potential in their process. We present our findings of how AI helped envision these futures, aid performances, and report some initial reflections from children about the technology.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
OmniLingo: Listening- and speaking-based language learning
Authors:
Francis M. Tyers,
Nicholas Howell
Abstract:
In this demo paper we present OmniLingo, an architecture for distributing data for listening- and speaking-based language learning applications and a demonstration client built using the architecture. The architecture is based on the Interplanetary Filesystem (IPFS) and puts at the forefront user sovereignty over data.
In this demo paper we present OmniLingo, an architecture for distributing data for listening- and speaking-based language learning applications and a demonstration client built using the architecture. The architecture is based on the Interplanetary Filesystem (IPFS) and puts at the forefront user sovereignty over data.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies
Authors:
James Paul Mason,
Alexandra Werth,
Colin G. West,
Allison A. Youngblood,
Donald L. Woodraska,
Courtney Peck,
Kevin Lacjak,
Florian G. Frick,
Moutamen Gabir,
Reema A. Alsinan,
Thomas Jacobsen,
Mohammad Alrubaie,
Kayla M. Chizmar,
Benjamin P. Lau,
Lizbeth Montoya Dominguez,
David Price,
Dylan R. Butler,
Connor J. Biron,
Nikita Feoktistov,
Kai Dewey,
N. E. Loomis,
Michal Bodzianowski,
Connor Kuybus,
Henry Dietrick,
Aubrey M. Wolfe
, et al. (977 additional authors not shown)
Abstract:
Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th…
▽ More
Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
A Second Wave of UD Hebrew Treebanking and Cross-Domain Parsing
Authors:
Amir Zeldes,
Nick Howell,
Noam Ordan,
Yifat Ben Moshe
Abstract:
Foundational Hebrew NLP tasks such as segmentation, tagging and parsing, have relied to date on various versions of the Hebrew Treebank (HTB, Sima'an et al. 2001). However, the data in HTB, a single-source newswire corpus, is now over 30 years old, and does not cover many aspects of contemporary Hebrew on the web. This paper presents a new, freely available UD treebank of Hebrew stratified from a…
▽ More
Foundational Hebrew NLP tasks such as segmentation, tagging and parsing, have relied to date on various versions of the Hebrew Treebank (HTB, Sima'an et al. 2001). However, the data in HTB, a single-source newswire corpus, is now over 30 years old, and does not cover many aspects of contemporary Hebrew on the web. This paper presents a new, freely available UD treebank of Hebrew stratified from a range of topics selected from Hebrew Wikipedia. In addition to introducing the corpus and evaluating the quality of its annotations, we deploy automatic validation tools based on grew (Guillaume, 2021), and conduct the first cross domain parsing experiments in Hebrew. We obtain new state-of-the-art (SOTA) results on UD NLP tasks, using a combination of the latest language modelling and some incremental improvements to existing transformer based approaches. We also release a new version of the UD HTB matching annotation scheme updates from our new corpus.
△ Less
Submitted 18 October, 2022; v1 submitted 14 October, 2022;
originally announced October 2022.