FeelsGoodMan: Inferring Semantics of Twitch Neologisms
Authors:
Pavel Dolin,
Luc d'Hauthuille,
Andrea Vattani
Abstract:
Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain…
▽ More
Twitch chats pose a unique problem in natural language understanding due to a large presence of neologisms, specifically emotes. There are a total of 8.06 million emotes, over 400k of which were used in the week studied. There is virtually no information on the meaning or sentiment of emotes, and with a constant influx of new emotes and drift in their frequencies, it becomes impossible to maintain an updated manually-labeled dataset. Our paper makes a two fold contribution. First we establish a new baseline for sentiment analysis on Twitch data, outperforming the previous supervised benchmark by 7.9% points. Secondly, we introduce a simple but powerful unsupervised framework based on word embeddings and k-NN to enrich existing models with out-of-vocabulary knowledge. This framework allows us to auto-generate a pseudo-dictionary of emotes and we show that we can nearly match the supervised benchmark above even when injecting such emote knowledge into sentiment classifiers trained on extraneous datasets such as movie reviews or Twitter.
△ Less
Submitted 17 November, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
Updated Results of a Solid-State Sensor Irradiation Study for ILC Extreme Forward Calorimetry
Authors:
Paul Anderson,
Wyatt Crockett,
Luc D'Hauthuille,
Vitaliy Fadeyev,
Caleb Fink,
Cesar Gonzalez-Renteria,
Benjamin Gruey,
Jane Gunnell,
Forest Martinez-McKinney,
Greg Rischbieter,
Kyle Rocha,
Bruce A. Schumm,
Edwin Spencer,
Max Wilder
Abstract:
Detectors proposed for the International Linear Collider (ILC) incorporate a tungsten sampling calorimeter (`BeamCal') intended to reconstruct showers of electrons, positrons and photons that emerge from the interaction point of the collider with angles between 5 and 50 milliradians. For the innermost radius of this calorimeter, radiation doses at shower max are expected to reach 100 Mrad per year…
▽ More
Detectors proposed for the International Linear Collider (ILC) incorporate a tungsten sampling calorimeter (`BeamCal') intended to reconstruct showers of electrons, positrons and photons that emerge from the interaction point of the collider with angles between 5 and 50 milliradians. For the innermost radius of this calorimeter, radiation doses at shower max are expected to reach 100 Mrad per year, primarily due to minimum-ionizing electrons and positrons that arise in the induced electromagnetic showers of e$^+$e$^-$ `beamstrahlung' pairs produced in the ILC beam-beam interaction. However, radiation damage to calorimeter sensors may be dominated by hadrons induced by nuclear interactions of shower photons, which are much more likely to contribute to the non-ionizing energy loss that has been observed to damage sensors exposed to hadronic radiation. We report here on prior highlights and recent results of SLAC Experiment T-506, for which several different types of semiconductor sensors were exposed to doses of radiation induced by showering electrons of energy 3.5-13.3 GeV. By embedding the sensor under irradiation within a tungsten radiator, the exposure incorporated hadronic species that would potentially contribute to the degradation of a sensor mounted in a precision sampling calorimeter. Depending on sensor technology, significant post-irradiation charge collection was observed for doses of several hundred Mrad.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.