-
Controlled Text Generation with Hidden Representation Transformations
Authors:
Vaibhav Kumar,
Hana Koorehdavoudi,
Masud Moshtaghi,
Amita Misra,
Ankit Chadha,
Emilio Ferrara
Abstract:
We propose CHRT (Control Hidden Representation Transformation) - a controlled language generation framework that steers large language models to generate text pertaining to certain attributes (such as toxicity). CHRT gains attribute control by modifying the hidden representation of the base model through learned transformations. We employ a contrastive-learning framework to learn these transformat…
▽ More
We propose CHRT (Control Hidden Representation Transformation) - a controlled language generation framework that steers large language models to generate text pertaining to certain attributes (such as toxicity). CHRT gains attribute control by modifying the hidden representation of the base model through learned transformations. We employ a contrastive-learning framework to learn these transformations that can be combined to gain multi-attribute control. The effectiveness of CHRT is experimentally shown by comparing it with seven baselines over three attributes. CHRT outperforms all the baselines in the task of detoxification, positive sentiment steering, and text simplification while minimizing the loss in linguistic qualities. Further, our approach has the lowest inference latency of only 0.01 seconds more than the base model, making it the most suitable for high-performance production environments. We open-source our code and release two novel datasets to further propel controlled language generation research.
△ Less
Submitted 31 May, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Identifying Informational Sources in News Articles
Authors:
Alexander Spangher,
Nanyun Peng,
Jonathan May,
Emilio Ferrara
Abstract:
News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational s…
▽ More
News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational sources used in news writing. We show that our dataset can be used to train high-performing models for information detection and source attribution. We further introduce a novel task, source prediction, to study the compositionality of sources in news articles. We show good performance on this task, which we argue is an important proof for narrative science exploring the internal structure of news articles and aiding in planning-based language generation, and an important step towards a source-recommendation system to aid journalists.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Fairness And Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, And Mitigation Strategies
Authors:
Emilio Ferrara
Abstract:
The significant advancements in applying Artificial Intelligence (AI) to healthcare decision-making, medical diagnosis, and other domains have simultaneously raised concerns about the fairness and bias of AI systems. This is particularly critical in areas like healthcare, employment, criminal justice, credit scoring, and increasingly, in generative AI models (GenAI) that produce synthetic media. S…
▽ More
The significant advancements in applying Artificial Intelligence (AI) to healthcare decision-making, medical diagnosis, and other domains have simultaneously raised concerns about the fairness and bias of AI systems. This is particularly critical in areas like healthcare, employment, criminal justice, credit scoring, and increasingly, in generative AI models (GenAI) that produce synthetic media. Such systems can lead to unfair outcomes and perpetuate existing inequalities, including generative biases that affect the representation of individuals in synthetic data. This survey paper offers a succinct, comprehensive overview of fairness and bias in AI, addressing their sources, impacts, and mitigation strategies. We review sources of bias, such as data, algorithm, and human decision biases - highlighting the emergent issue of generative AI bias where models may reproduce and amplify societal stereotypes. We assess the societal impact of biased AI systems, focusing on the perpetuation of inequalities and the reinforcement of harmful stereotypes, especially as generative AI becomes more prevalent in creating content that influences public perception. We explore various proposed mitigation strategies, discussing the ethical considerations of their implementation and emphasizing the need for interdisciplinary collaboration to ensure effectiveness. Through a systematic literature review spanning multiple academic disciplines, we present definitions of AI bias and its different types, including a detailed look at generative AI bias. We discuss the negative impacts of AI bias on individuals and society and provide an overview of current approaches to mitigate AI bias, including data pre-processing, model selection, and post-processing. We emphasize the unique challenges presented by generative AI models and the importance of strategies specifically tailored to address these.
△ Less
Submitted 7 December, 2023; v1 submitted 15 April, 2023;
originally announced April 2023.
-
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Authors:
Emilio Ferrara
Abstract:
As the capabilities of generative language models continue to advance, the implications of biases ingrained within these models have garnered increasing attention from researchers, practitioners, and the broader public. This article investigates the challenges and risks associated with biases in large-scale language models like ChatGPT. We discuss the origins of biases, stemming from, among others…
▽ More
As the capabilities of generative language models continue to advance, the implications of biases ingrained within these models have garnered increasing attention from researchers, practitioners, and the broader public. This article investigates the challenges and risks associated with biases in large-scale language models like ChatGPT. We discuss the origins of biases, stemming from, among others, the nature of training data, model specifications, algorithmic constraints, product design, and policy decisions. We explore the ethical concerns arising from the unintended consequences of biased model outputs. We further analyze the potential opportunities to mitigate biases, the inevitability of some biases, and the implications of deploying these models in various applications, such as virtual assistants, content generation, and chatbots. Finally, we review the current approaches to identify, quantify, and mitigate biases in language models, emphasizing the need for a multi-disciplinary, collaborative effort to develop more equitable, transparent, and responsible AI systems. This article aims to stimulate a thoughtful dialogue within the artificial intelligence community, encouraging researchers and developers to reflect on the role of biases in generative language models and the ongoing pursuit of ethical AI.
△ Less
Submitted 13 November, 2023; v1 submitted 7 April, 2023;
originally announced April 2023.
-
Leveraging Social Interactions to Detect Misinformation on Social Media
Authors:
Tommaso Fornaciari,
Luca Luceri,
Emilio Ferrara,
Dirk Hovy
Abstract:
Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as reliable or unreliable, based on a previous evaluation of the information source. The models identifying unreliable threads usually rely on textual feat…
▽ More
Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as reliable or unreliable, based on a previous evaluation of the information source. The models identifying unreliable threads usually rely on textual features. But reliability is not just what is said, but by whom and to whom. We additionally leverage on network information. Following the homophily principle, we hypothesize that users who interact are generally interested in similar topics and spreading similar kind of news, which in turn is generally reliable or not. We test several methods to learn representations of the social interactions within the cascades, combining them with deep neural language models in a Multi-Input (MI) framework. Kee** track of the sequence of the interactions during the time, we improve over previous state-of-the-art models.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Unveiling the Dynamics of Censorship, COVID-19 Regulations, and Protest: An Empirical Study of Chinese Subreddit r/china_irl
Authors:
Siyi Zhou,
Luca Luceri,
Emilio Ferrara
Abstract:
The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest…
▽ More
The COVID-19 pandemic has intensified numerous social issues that warrant academic investigation. Although information dissemination has been extensively studied, the silenced voices and censored content also merit attention due to their role in mobilizing social movements. In this paper, we provide empirical evidence to explore the relationships among COVID-19 regulations, censorship, and protest through a series of social incidents occurred in China during 2022. We analyze the similarities and differences between censored articles and discussions on r/china\_irl, the most popular Chinese-speaking subreddit, and scrutinize the temporal dynamics of government censorship activities and their impact on user engagement within the subreddit. Furthermore, we examine users' linguistic patterns under the influence of a censorship-driven environment. Our findings reveal patterns in topic recurrence, the complex interplay between censorship activities, user subscription, and collective commenting behavior, as well as potential linguistic adaptation strategies to circumvent censorship. These insights hold significant implications for researchers interested in understanding the survival mechanisms of marginalized groups within censored information ecosystems.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
The Interconnected Nature of Online Harm and Moderation: Investigating the Cross-Platform Spread of Harmful Content between YouTube and Twitter
Authors:
Valerio La Gatta,
Luca Luceri,
Francesco Fabbri,
Emilio Ferrara
Abstract:
The proliferation of harmful content shared online poses a threat to online information integrity and the integrity of discussion across platforms. Despite various moderation interventions adopted by social media platforms, researchers and policymakers are calling for holistic solutions. This study explores how a target platform could leverage content that has been deemed harmful on a source platf…
▽ More
The proliferation of harmful content shared online poses a threat to online information integrity and the integrity of discussion across platforms. Despite various moderation interventions adopted by social media platforms, researchers and policymakers are calling for holistic solutions. This study explores how a target platform could leverage content that has been deemed harmful on a source platform by investigating the behavior and characteristics of Twitter users responsible for sharing moderated YouTube videos. Using a large-scale dataset of 600M tweets related to the 2020 U.S. election, we find that moderated Youtube videos are extensively shared on Twitter and that users who share these videos also endorse extreme and conspiratorial ideologies. A fraction of these users are eventually suspended by Twitter, but they do not appear to be involved in state-backed information operations. The findings of this study highlight the complex and interconnected nature of harmful cross-platform information diffusion, raising the need for cross-platform moderation strategies.
△ Less
Submitted 6 April, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
Fermi-GBM Discovery of GRB 221009A: An Extraordinarily Bright GRB from Onset to Afterglow
Authors:
S. Lesage,
P. Veres,
M. S. Briggs,
A. Goldstein,
D. Kocevski,
E. Burns,
C. A. Wilson-Hodge,
P. N. Bhat,
D. Huppenkothen,
C. L. Fryer,
R. Hamburg,
J. Racusin,
E. Bissaldi,
W. H. Cleveland,
S. Dalessi,
C. Fletcher,
M. M. Giles,
B. A. Hristov,
C. M. Hui,
B. Mailyan,
C. Malacaria,
S. Poolakkil,
O. J. Roberts,
A. von Kienlin,
J. Wood
, et al. (115 additional authors not shown)
Abstract:
We report the discovery of GRB 221009A, the highest flux gamma-ray burst ever observed by the Fermi Gamma-ray Burst Monitor (GBM). This GRB has continuous prompt emission lasting more than 600 seconds which smoothly transitions to afterglow visible in the GBM energy range (8 keV--40 MeV), and total energetics higher than any other burst in the GBM sample. By using a variety of new and existing ana…
▽ More
We report the discovery of GRB 221009A, the highest flux gamma-ray burst ever observed by the Fermi Gamma-ray Burst Monitor (GBM). This GRB has continuous prompt emission lasting more than 600 seconds which smoothly transitions to afterglow visible in the GBM energy range (8 keV--40 MeV), and total energetics higher than any other burst in the GBM sample. By using a variety of new and existing analysis techniques we probe the spectral and temporal evolution of GRB 221009A. We find no emission prior to the GBM trigger time (t0; 2022 October 9 at 13:16:59.99 UTC), indicating that this is the time of prompt emission onset. The triggering pulse exhibits distinct spectral and temporal properties suggestive of the thermal, photospheric emission of shock-breakout, with significant emission up to $\sim$15 MeV. We characterize the onset of external shock at t0+600 s and find evidence of a plateau region in the early-afterglow phase which transitions to a slope consistent with Swift-XRT afterglow measurements. We place the total energetics of GRB 221009A in context with the rest of the GBM sample and find that this GRB has the highest total isotropic-equivalent energy ($\textrm{E}_{γ,\textrm{iso}}=1.0\times10^{55}$ erg) and second highest isotropic-equivalent luminosity ($\textrm{L}_{γ,\textrm{iso}}=9.9\times10^{53}$ erg/s) based on redshift of z = 0.151. These extreme energetics are what allowed us to observe the continuously emitting central engine of GBM from the beginning of the prompt emission phase through the onset of early afterglow.
△ Less
Submitted 12 July, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Searching for continuous Gravitational Waves in the second data release of the International Pulsar Timing Array
Authors:
M. Falxa,
S. Babak,
P. T. Baker,
B. Bécsy,
A. Chalumeau,
S. Chen,
Z. Chen,
N. J. Cornish,
L. Guillemot,
J. S. Hazboun,
C. M. F. Mingarelli,
A. Parthasarathy,
A. Petiteau,
N. S. Pol,
A. Sesana,
S. B. Spolaor,
S. R. Taylor,
G. Theureau,
M. Vallisneri,
S. J. Vigeland,
C. A. Witt,
X. Zhu,
J. Antoniadis,
Z. Arzoumanian,
M. Bailes
, et al. (102 additional authors not shown)
Abstract:
The International Pulsar Timing Array 2nd data release is the combination of datasets from worldwide collaborations. In this study, we search for continuous waves: gravitational wave signals produced by individual supermassive black hole binaries in the local universe. We consider binaries on circular orbits and neglect the evolution of orbital frequency over the observational span. We find no evi…
▽ More
The International Pulsar Timing Array 2nd data release is the combination of datasets from worldwide collaborations. In this study, we search for continuous waves: gravitational wave signals produced by individual supermassive black hole binaries in the local universe. We consider binaries on circular orbits and neglect the evolution of orbital frequency over the observational span. We find no evidence for such signals and set sky averaged 95% upper limits on their amplitude h 95 . The most sensitive frequency is 10nHz with h 95 = 9.1 10-15 . We achieved the best upper limit to date at low and high frequencies of the PTA band thanks to improved effective cadence of observations. In our analysis, we have taken into account the recently discovered common red noise process, which has an impact at low frequencies. We also find that the peculiar noise features present in some pulsars data must be taken into account to reduce the false alarm. We show that using custom noise models is essential in searching for continuous gravitational wave signals and setting the upper limit.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Retrieving false claims on Twitter during the Russia-Ukraine conflict
Authors:
Valerio La Gatta,
Chiyu Wei,
Luca Luceri,
Francesco Pierri,
Emilio Ferrara
Abstract:
Nowadays, false and unverified information on social media sway individuals' perceptions during major geo-political events and threaten the quality of the whole digital information ecosystem. Since the Russian invasion of Ukraine, several fact-checking organizations have been actively involved in verifying stories related to the conflict that circulated online. In this paper, we leverage a public…
▽ More
Nowadays, false and unverified information on social media sway individuals' perceptions during major geo-political events and threaten the quality of the whole digital information ecosystem. Since the Russian invasion of Ukraine, several fact-checking organizations have been actively involved in verifying stories related to the conflict that circulated online. In this paper, we leverage a public repository of fact-checked claims to build a methodological framework for automatically identifying false and unsubstantiated claims spreading on Twitter in February 2022. Our framework consists of two sequential models: First, the claim detection model identifies whether tweets incorporate a (false) claim among those considered in our collection. Then, the claim retrieval model matches the tweets with fact-checked information by ranking verified claims according to their relevance with the input tweet. Both models are based on pre-trained language models and fine-tuned to perform a text classification task and an information retrieval task, respectively. In particular, to validate the effectiveness of our methodology, we consider 83 verified false claims that spread on Twitter during the first week of the invasion, and manually annotate 5,872 tweets according to the claim(s) they report. Our experiments show that our proposed methodology outperforms standard baselines for both claim detection and claim retrieval. Overall, our results highlight how social media providers could effectively leverage semi-automated approaches to identify, track, and eventually moderate false information that spreads on their platforms.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Unusual Hard X-ray Flares Caught in NICER Monitoring of the Binary Supermassive Black Hole Candidate AT2019cuk/Tick Tock/SDSS J1430+2303
Authors:
Megan Masterson,
Erin Kara,
Dheeraj R. Pasham,
Daniel J. D'Orazio,
Dominic J. Walton,
Andrew C. Fabian,
Matteo Lucchini,
Ronald A. Remillard,
Zaven Arzoumanian,
Otabek Burkhonov,
Hyeonho Choi,
Shuhrat A. Ehgamberdiev,
Elizabeth C. Ferrara,
Muryel Guolo,
Myungshin Im,
Yonggi Kim,
Davron Mirzaqulov,
Gregory S. H. Paek,
Hyun-il Sung,
Joh-Na Yoon
Abstract:
The nuclear transient AT2019cuk/Tick Tock/SDSS J1430+2303 has been suggested to harbor a supermassive black hole (SMBH) binary near coalescence. We report results from high-cadence NICER X-ray monitoring with multiple visits per day from January-August 2022, as well as continued optical monitoring during the same time period. We find no evidence of periodic/quasi-periodic modulation in the X-ray,…
▽ More
The nuclear transient AT2019cuk/Tick Tock/SDSS J1430+2303 has been suggested to harbor a supermassive black hole (SMBH) binary near coalescence. We report results from high-cadence NICER X-ray monitoring with multiple visits per day from January-August 2022, as well as continued optical monitoring during the same time period. We find no evidence of periodic/quasi-periodic modulation in the X-ray, UV, or optical bands, however we do observe exotic hard X-ray variability that is unusual for a typical AGN. The most striking feature of the NICER light curve is repetitive hard (2-4 keV) X-ray flares that result in distinctly harder X-ray spectra compared to the non-flaring data. In its non-flaring state, AT2019cuk looks like a relatively standard AGN, but it presents the first case of day-long, hard X-ray flares in a changing-look AGN. We consider a few different models for the driving mechanism of these hard X-ray flares, including: (1) corona/jet variability driven by increased magnetic activity, (2) variable obscuration, and (3) self-lensing from the potential secondary SMBH. We prefer the variable corona model, as the obscuration model requires rather contrived timescales and the self-lensing model is difficult to reconcile with a lack of clear periodicity in the flares. These findings illustrate how important high-cadence X-ray monitoring is to our understanding of the rapid variability of the X-ray corona and necessitate further high-cadence, multi-wavelength monitoring of changing-look AGN like AT2019cuk to probe the corona-jet connection.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
GRB 221009A: Discovery of an Exceptionally Rare Nearby and Energetic Gamma-Ray Burst
Authors:
Maia A. Williams,
Jamie A. Kennea,
S. Dichiara,
Kohei Kobayashi,
Wataru B. Iwakiri,
Andrew P. Beardmore,
P. A. Evans,
Sebastian Heinz,
Amy Lien,
S. R. Oates,
Hitoshi Negoro,
S. Bradley Cenko,
Douglas J. K. Buisson,
Dieter H. Hartmann,
Gaurava K. Jaisawal,
N. P. M. Kuin,
Stephen Lesage,
Kim L. Page,
Tyler Parsotan,
Dheeraj R. Pasham,
B. Sbarufatti,
Michael H. Siegel,
Satoshi Sugita,
George Younes,
Elena Ambrosi
, et al. (31 additional authors not shown)
Abstract:
We report the discovery of the unusually bright long-duration gamma-ray burst (GRB), GRB 221009A, as observed by the Neil Gehrels Swift Observatory (Swift), Monitor of All-sky X-ray Image (MAXI), and Neutron Star Interior Composition Explorer Mission (NICER). This energetic GRB was located relatively nearby (z = 0.151), allowing for sustained observations of the afterglow. The large X-ray luminosi…
▽ More
We report the discovery of the unusually bright long-duration gamma-ray burst (GRB), GRB 221009A, as observed by the Neil Gehrels Swift Observatory (Swift), Monitor of All-sky X-ray Image (MAXI), and Neutron Star Interior Composition Explorer Mission (NICER). This energetic GRB was located relatively nearby (z = 0.151), allowing for sustained observations of the afterglow. The large X-ray luminosity and low Galactic latitude (b = 4.3 degrees) make GRB 221009A a powerful probe of dust in the Milky Way. Using echo tomography we map the line-of-sight dust distribution and find evidence for significant column densities at large distances (~> 10kpc). We present analysis of the light curves and spectra at X-ray and UV/optical wavelengths, and find that the X-ray afterglow of GRB 221009A is more than an order of magnitude brighter at T0 + 4.5 ks than any previous GRB observed by Swift. In its rest frame GRB 221009A is at the high end of the afterglow luminosity distribution, but not uniquely so. In a simulation of randomly generated bursts, only 1 in 10^4 long GRBs were as energetic as GRB 221009A; such a large E_gamma,iso implies a narrow jet structure, but the afterglow light curve is inconsistent with simple top-hat jet models. Using the sample of Swift GRBs with redshifts, we estimate that GRBs as energetic and nearby as GRB 221009A occur at a rate of ~<1 per 1000 yr - making this a truly remarkable opportunity unlikely to be repeated in our lifetime.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
The NANOGrav 12.5-year Data Set: Bayesian Limits on Gravitational Waves from Individual Supermassive Black Hole Binaries
Authors:
Zaven Arzoumanian,
Paul T. Baker,
Laura Blecha,
Harsha Blumer,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Bence Bécsy,
J. Andrew Casey-Clyde,
Maria Charisi,
Shami Chatterjee,
Siyuan Chen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler,
Justin A. Ellis,
E. C. Ferrara,
William Fiore,
Emmanuel Fonseca,
Gabriel E. Freedman
, et al. (53 additional authors not shown)
Abstract:
Pulsar timing array collaborations, such as the North American Nanohertz Observatory for Gravitational Waves (NANOGrav), are seeking to detect nanohertz gravitational waves emitted by supermassive black hole binaries formed in the aftermath of galaxy mergers. We have searched for continuous waves from individual circular supermassive black hole binaries using the NANOGrav's recent 12.5-year data s…
▽ More
Pulsar timing array collaborations, such as the North American Nanohertz Observatory for Gravitational Waves (NANOGrav), are seeking to detect nanohertz gravitational waves emitted by supermassive black hole binaries formed in the aftermath of galaxy mergers. We have searched for continuous waves from individual circular supermassive black hole binaries using the NANOGrav's recent 12.5-year data set. We created new methods to accurately model the uncertainties on pulsar distances in our analysis, and we implemented new techniques to account for a common red noise process in pulsar timing array data sets while searching for deterministic gravitational wave signals, including continuous waves. As we found no evidence for continuous waves in our data, we placed 95\% upper limits on the strain amplitude of continuous waves emitted by these sources. At our most sensitive frequency of 7.65 nanohertz, we placed a sky-averaged limit of $h_0 < $ $(6.82 \pm 0.35) \times 10^{-15}$, and $h_0 <$ $(2.66 \pm 0.15) \times 10^{-15}$ in our most sensitive sky location. Finally, we placed a multi-messenger limit of $\mathcal{M} <$ $(1.41 \pm 0.02) \times 10^9 M_\odot$ on the chirp mass of the supermassive black hole binary candidate 3C~66B.
△ Less
Submitted 6 June, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
The Fermi-LAT Light Curve Repository
Authors:
S. Abdollahi,
M. Ajello,
L. Baldini,
J. Ballet,
D. Bastieri,
J. Becerra Gonzalez,
R. Bellazzini,
A. Berretta,
E. Bissaldi,
R. Bonino,
A. Brill,
P. Bruel,
E. Burns,
S. Buson,
A. Cameron,
R. Caputo,
P. A. Caraveo,
N. Cibrario,
S. Ciprini,
P. Cristarella Orestano,
M. Crnogorcevic,
S. Cutini,
F. D'Ammando,
S. De Gaetano,
S. W. Digel
, et al. (88 additional authors not shown)
Abstract:
The Fermi Large Area Telescope (LAT) light curve repository (LCR) is a publicly available, continually updated library of gamma-ray light curves of variable Fermi-LAT sources generated over multiple timescales. The Fermi-LAT LCR aims to provide publication-quality light curves binned on timescales of 3 days, 7 days, and 30 days for 1525 sources deemed variable in the source catalog of the first 10…
▽ More
The Fermi Large Area Telescope (LAT) light curve repository (LCR) is a publicly available, continually updated library of gamma-ray light curves of variable Fermi-LAT sources generated over multiple timescales. The Fermi-LAT LCR aims to provide publication-quality light curves binned on timescales of 3 days, 7 days, and 30 days for 1525 sources deemed variable in the source catalog of the first 10 years of Fermi-LAT observations. The repository consists of light curves generated through full likelihood analyses that model the sources and the surrounding region, providing fluxes and photon indices for each time bin. The LCR is intended as a resource for the time-domain and multi-messenger communities by allowing users to quickly search LAT data to identify correlated variability and flaring emission episodes from gamma-ray sources. We describe the sample selection and analysis employed by the LCR and provide an overview of the associated data access portal.
△ Less
Submitted 14 February, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
Propaganda and Misinformation on Facebook and Twitter during the Russian Invasion of Ukraine
Authors:
Francesco Pierri,
Luca Luceri,
Nikhil **dal,
Emilio Ferrara
Abstract:
Online social media represent an oftentimes unique source of information, and having access to reliable and unbiased content is crucial, especially during crises and contentious events. We study the spread of propaganda and misinformation that circulated on Facebook and Twitter during the first few months of the Russia-Ukraine conflict. By leveraging two large datasets of millions of social media…
▽ More
Online social media represent an oftentimes unique source of information, and having access to reliable and unbiased content is crucial, especially during crises and contentious events. We study the spread of propaganda and misinformation that circulated on Facebook and Twitter during the first few months of the Russia-Ukraine conflict. By leveraging two large datasets of millions of social media posts, we estimate the prevalence of Russian propaganda and low-credibility content on the two platforms, describing temporal patterns and highlighting the disproportionate role played by superspreaders in amplifying unreliable content. We infer the political leaning of Facebook pages and Twitter users sharing propaganda and misinformation, and observe they tend to be more right-leaning than the average. By estimating the amount of content moderated by the two platforms, we show that only about 8-15% of the posts and tweets sharing links to Russian propaganda or untrustworthy sources were removed. Overall, our findings show that Facebook and Twitter are still vulnerable to abuse, especially during crises: we highlight the need to urgently address this issue to preserve the integrity of online conversations.
△ Less
Submitted 20 February, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
The Birth of a Relativistic Jet Following the Disruption of a Star by a Cosmological Black Hole
Authors:
Dheeraj R. Pasham,
Matteo Lucchini,
Tanmoy Laskar,
Benjamin P. Gompertz,
Shubham Srivastav,
Matt Nicholl,
Stephen J. Smartt,
James C. A. Miller-Jones,
Kate D. Alexander,
Rob Fender,
Graham P. Smith,
Michael D. Fulton,
Gulab Dewangan,
Keith Gendreau,
Eric R. Coughlin,
Lauren Rhodes,
Assaf Horesh,
Sjoert van Velzen,
Itai Sfaradi,
Muryel Guolo,
N. Castro Segura,
Aysha Aamer,
Joseph P. Anderson,
Iair Arcavi,
Sean J. Brennan
, et al. (41 additional authors not shown)
Abstract:
A black hole can launch a powerful relativistic jet after it tidally disrupts a star. If this jet fortuitously aligns with our line of sight, the overall brightness is Doppler boosted by several orders of magnitude. Consequently, such on-axis relativistic tidal disruption events (TDEs) have the potential to unveil cosmological (redshift $z>$1) quiescent black holes and are ideal test beds to under…
▽ More
A black hole can launch a powerful relativistic jet after it tidally disrupts a star. If this jet fortuitously aligns with our line of sight, the overall brightness is Doppler boosted by several orders of magnitude. Consequently, such on-axis relativistic tidal disruption events (TDEs) have the potential to unveil cosmological (redshift $z>$1) quiescent black holes and are ideal test beds to understand the radiative mechanisms operating in super-Eddington jets. Here, we present multi-wavelength (X-ray, UV, optical, and radio) observations of the optically discovered transient \target at $z=1.193$. Its unusual X-ray properties, including a peak observed luminosity of $\gtrsim$10$^{48}$ erg s$^{-1}$, systematic variability on timescales as short as 1000 seconds, and overall duration lasting more than 30 days in the rest-frame are traits associated with relativistic TDEs. The X-ray to radio spectral energy distributions spanning 5-50 days after discovery can be explained as synchrotron emission from a relativistic jet (radio), synchrotron self-Compton (X-rays), and thermal emission similar to that seen in low-redshift TDEs (UV/optical). Our modeling implies a beamed, highly relativistic jet akin to blazars but requires extreme matter-domination, i.e, high ratio of electron-to-magnetic field energy densities in the jet, and challenges our theoretical understanding of jets.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
From Fake News to #FakeNews: Mining Direct and Indirect Relationships among Hashtags for Fake News Detection
Authors:
Xinyi Zhou,
Reza Zafarani,
Emilio Ferrara
Abstract:
The COVID-19 pandemic has gained worldwide attention and allowed fake news, such as ``COVID-19 is the flu,'' to spread quickly and widely on social media. Combating this coronavirus infodemic demands effective methods to detect fake news. To this end, we propose a method to infer news credibility from hashtags involved in news dissemination on social media, motivated by the tight connection betwee…
▽ More
The COVID-19 pandemic has gained worldwide attention and allowed fake news, such as ``COVID-19 is the flu,'' to spread quickly and widely on social media. Combating this coronavirus infodemic demands effective methods to detect fake news. To this end, we propose a method to infer news credibility from hashtags involved in news dissemination on social media, motivated by the tight connection between hashtags and news credibility observed in our empirical analyses. We first introduce a new graph that captures all (direct and \textit{indirect}) relationships among hashtags. Then, a language-independent semi-supervised algorithm is developed to predict fake news based on this constructed graph. This study first investigates the indirect relationship among hashtags; the proposed approach can be extended to any homogeneous graph to capture a comprehensive relationship among nodes. Language independence opens the proposed method to multilingual fake news detection. Experiments conducted on two real-world datasets demonstrate the effectiveness of our approach in identifying fake news, especially at an \textit{early} stage of propagation.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Twitter Spam and False Accounts Prevalence, Detection and Characterization: A Survey
Authors:
Emilio Ferrara
Abstract:
The issue of quantifying and characterizing various forms of social media manipulation and abuse has been at the forefront of the computational social science research community for over a decade. In this paper, I provide a (non-comprehensive) survey of research efforts aimed at estimating the prevalence of spam and false accounts on Twitter, as well as characterizing their use, activity, and beha…
▽ More
The issue of quantifying and characterizing various forms of social media manipulation and abuse has been at the forefront of the computational social science research community for over a decade. In this paper, I provide a (non-comprehensive) survey of research efforts aimed at estimating the prevalence of spam and false accounts on Twitter, as well as characterizing their use, activity, and behavior. I propose a taxonomy of spam and false accounts, enumerating known techniques used to create and detect them. Then, I summarize studies estimating the prevalence of spam and false accounts on Twitter. Finally, I report on research that illustrates how spam and false accounts are used for scams and frauds, stock market manipulation, political disinformation and deception, conspiracy amplification, coordinated influence, public health misinformation campaigns, radical propaganda and recruitment, and more. I will conclude with a set of recommendations aimed at charting the path forward to combat these problems.
△ Less
Submitted 7 February, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
An unusual pulse shape change event in PSR J1713+0747 observed with the Green Bank Telescope and CHIME
Authors:
Ross J. Jennings,
James M. Cordes,
Shami Chatterjee,
Maura A. McLaughlin,
Paul B. Demorest,
Zaven Arzoumanian,
Paul T. Baker,
Harsha Blumer,
Paul R. Brook,
Tyler Cohen,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Timothy Dolch,
Elizabeth C. Ferrara,
Emmanuel Fonseca,
Deborah C. Good,
Jeffrey S. Hazboun,
Megan L. Jones,
David L. Kaplan,
Michael T. Lam,
T. Joseph W. Lazio,
Duncan R. Lorimer,
**g Luo,
Ryan S. Lynch
, et al. (19 additional authors not shown)
Abstract:
The millisecond pulsar J1713+0747 underwent a sudden and significant pulse shape change between April 16 and 17, 2021 (MJDs 59320 and 59321). Subsequently, the pulse shape gradually recovered over the course of several months. We report the results of continued multi-frequency radio observations of the pulsar made using the Canadian Hydrogen Intensity Map** Experiment (CHIME) and the 100-meter G…
▽ More
The millisecond pulsar J1713+0747 underwent a sudden and significant pulse shape change between April 16 and 17, 2021 (MJDs 59320 and 59321). Subsequently, the pulse shape gradually recovered over the course of several months. We report the results of continued multi-frequency radio observations of the pulsar made using the Canadian Hydrogen Intensity Map** Experiment (CHIME) and the 100-meter Green Bank Telescope (GBT) in a three-year period encompassing the shape change event, between February 2020 and February 2023. As of February 2023, the pulse shape had returned to a state similar to that seen before the event, but with measurable changes remaining. The amplitude of the shape change and the accompanying TOA residuals display a strong non-monotonic dependence on radio frequency, demonstrating that the event is neither a glitch (the effects of which should be independent of radio frequency, $ν$) nor a change in dispersion measure (DM) alone (which would produce a delay proportional to $ν^{-2}$). However, it does bear some resemblance to the two previous "chromatic timing events" observed in J1713+0747 (Demorest et al. 2013; Lam et al. 2016), as well as to a similar event observed in PSR J1643-1224 in 2015 (Shannon et al. 2016).
△ Less
Submitted 31 January, 2024; v1 submitted 21 October, 2022;
originally announced October 2022.
-
Exposing Influence Campaigns in the Age of LLMs: A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls
Authors:
Fatima Ezzeddine,
Luca Luceri,
Omran Ayoub,
Ihab Sbeity,
Gianluca Nogara,
Emilio Ferrara,
Silvia Giordano
Abstract:
The detection of state-sponsored trolls operating in influence campaigns on social media is a critical and unsolved challenge for the research community, which has significant implications beyond the online realm. To address this challenge, we propose a new AI-based solution that identifies troll accounts solely through behavioral cues associated with their sequences of sharing activity, encompass…
▽ More
The detection of state-sponsored trolls operating in influence campaigns on social media is a critical and unsolved challenge for the research community, which has significant implications beyond the online realm. To address this challenge, we propose a new AI-based solution that identifies troll accounts solely through behavioral cues associated with their sequences of sharing activity, encompassing both their actions and the feedback they receive from others. Our approach does not incorporate any textual content shared and consists of two steps: First, we leverage an LSTM-based classifier to determine whether account sequences belong to a state-sponsored troll or an organic, legitimate user. Second, we employ the classified sequences to calculate a metric named the "Troll Score", quantifying the degree to which an account exhibits troll-like behavior. To assess the effectiveness of our method, we examine its performance in the context of the 2016 Russian interference campaign during the U.S. Presidential election. Our experiments yield compelling results, demonstrating that our approach can identify account sequences with an AUC close to 99% and accurately differentiate between Russian trolls and organic users with an AUC of 91%. Notably, our behavioral-based approach holds a significant advantage in the ever-evolving landscape, where textual and linguistic properties can be easily mimicked by Large Language Models (LLMs): In contrast to existing language-based techniques, it relies on more challenging-to-replicate behavioral cues, ensuring greater resilience in identifying influence campaigns, especially given the potential increase in the usage of LLMs for generating inauthentic content. Finally, we assessed the generalizability of our solution to various entities driving different information operations and found promising results that will guide future research.
△ Less
Submitted 11 October, 2023; v1 submitted 17 October, 2022;
originally announced October 2022.
-
The Fourth Catalog of Active Galactic Nuclei Detected by the Fermi Large Area Telescope -- Data Release 3
Authors:
The Fermi-LAT collaboration,
:,
Marco Ajello,
Luca Baldini,
Jean Ballet,
Denis Bastieri,
Josefa Becerra Gonzalez,
Ronaldo Bellazzini,
Alessandra Berretta,
Elisabetta Bissaldi,
Raffaella Bonino,
Ari Brill,
Philippe Bruel,
Sara Buson,
Regina Caputo,
Patrizia Caraveo,
Teddy Cheung,
Graziano Chiaro,
Nicolo Cibrario,
Stefano Ciprini,
Milena Crnogorcevic,
Sara Cutini,
Filippo D'Ammando,
Salvatore De Gaetano,
Niccolo Di Lalla
, et al. (79 additional authors not shown)
Abstract:
An incremental version of the fourth catalog of active galactic nuclei (AGNs) detected by the Fermi-Large Area Telescope is presented. This version (4LAC-DR3) derives from the third data release of the 4FGL catalog based on 12 years of E>50 MeV gamma-ray data, where the spectral parameters, spectral energy distributions (SEDs), yearly light curves, and associations have been updated for all source…
▽ More
An incremental version of the fourth catalog of active galactic nuclei (AGNs) detected by the Fermi-Large Area Telescope is presented. This version (4LAC-DR3) derives from the third data release of the 4FGL catalog based on 12 years of E>50 MeV gamma-ray data, where the spectral parameters, spectral energy distributions (SEDs), yearly light curves, and associations have been updated for all sources. The new reported AGNs include 587 blazar candidates and four radio galaxies. We describe the properties of the new sample and outline changes affecting the previously published one. We also introduce two new parameters in this release, namely the peak energy of the SED high-energy component and the corresponding flux. These parameters allow an assessment of the Compton dominance, the ratio of the Inverse-Compton to the synchrotron peak luminosities, without relying on X-ray data.
△ Less
Submitted 6 October, 2022; v1 submitted 24 September, 2022;
originally announced September 2022.
-
Identifying and Characterizing Behavioral Classes of Radicalization within the QAnon Conspiracy on Twitter
Authors:
Emily L. Wang,
Luca Luceri,
Francesco Pierri,
Emilio Ferrara
Abstract:
Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspirac…
▽ More
Social media provide a fertile ground where conspiracy theories and radical ideas can flourish, reach broad audiences, and sometimes lead to hate or violence beyond the online world itself. QAnon represents a notable example of a political conspiracy that started out on social media but turned mainstream, in part due to public endorsement by influential political figures. Nowadays, QAnon conspiracies often appear in the news, are part of political rhetoric, and are espoused by significant swaths of people in the United States. It is therefore crucial to understand how such a conspiracy took root online, and what led so many social media users to adopt its ideas. In this work, we propose a framework that exploits both social interaction and content signals to uncover evidence of user radicalization or support for QAnon. Leveraging a large dataset of 240M tweets collected in the run-up to the 2020 US Presidential election, we define and validate a multivariate metric of radicalization. We use that to separate users in distinct, naturally-emerging, classes of behaviors associated to radicalization processes, from self-declared QAnon supporters to hyper-active conspiracy promoters. We also analyze the impact of Twitter's moderation policies on the interactions among different classes: we discover aspects of moderation that succeed, yielding a substantial reduction in the endorsement received by hyper-active QAnon accounts. But we also uncover where moderation fails, showing how QAnon content amplifiers are not deterred or affected by Twitter intervention. Our findings refine our understanding of online radicalization processes, reveal effective and ineffective aspects of moderation, and call for the need to further investigate the role social media play in the spread of conspiracies.
△ Less
Submitted 6 April, 2023; v1 submitted 19 September, 2022;
originally announced September 2022.
-
How does Twitter account moderation work? Dynamics of account creation and suspension on Twitter during major geopolitical events
Authors:
Francesco Pierri,
Luca Luceri,
Emily Chen,
Emilio Ferrara
Abstract:
Social media moderation policies are often at the center of public debate, and their implementation and enactment are sometimes surrounded by a veil of mystery. Unsurprisingly, due to limited platform transparency and data access, relatively little research has been devoted to characterizing moderation dynamics, especially in the context of controversial events and the platform activity associated…
▽ More
Social media moderation policies are often at the center of public debate, and their implementation and enactment are sometimes surrounded by a veil of mystery. Unsurprisingly, due to limited platform transparency and data access, relatively little research has been devoted to characterizing moderation dynamics, especially in the context of controversial events and the platform activity associated with them. Here, we study the dynamics of account creation and suspension on Twitter during two global political events: Russia's invasion of Ukraine and the 2022 French Presidential election. Leveraging a large-scale dataset of 270M tweets shared by 16M users in multiple languages over several months, we identify peaks of suspicious account creation and suspension, and we characterize behaviours that more frequently lead to account suspension. We show how large numbers of accounts get suspended within days from their creation. Suspended accounts tend to mostly interact with legitimate users, as opposed to other suspicious accounts, often making unwarranted and excessive use of reply and mention features, and predominantly sharing spam and harmful content. While we are only able to speculate about the specific causes leading to a given account suspension, our findings shed light on patterns of platform abuse and subsequent moderation during major events.
△ Less
Submitted 7 October, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
The discovery of the 528.6 Hz accreting millisecond X-ray pulsar MAXI J1816-195
Authors:
Peter Bult,
Diego Altamirano,
Zaven Arzoumanian,
Deepto Chakrabarty,
Jérôme Chenevez,
Elizabeth C. Ferrara,
Keith C. Gendreau,
Sebastien Guillot,
Tolga Güver,
Wataru Iwakiri,
Gaurava K. Jaisawal,
Giulio C. Mancuso,
Christian Malacaria,
Mason Ng,
Andrea Sanna,
Tod E. Strohmayer,
Zorawar Wadiasingh,
Michael T. Wolff
Abstract:
We present the discovery of 528.6 Hz pulsations in the new X-ray transient MAXI J1816-195. Using NICER, we observed the first recorded transient outburst from the neutron star low-mass X-ray binary MAXI J1816-195 over a period of 28 days. From a timing analysis of the 528.6 Hz pulsations, we find that the binary system is well described as a circular orbit with an orbital period of 4.8 hours and a…
▽ More
We present the discovery of 528.6 Hz pulsations in the new X-ray transient MAXI J1816-195. Using NICER, we observed the first recorded transient outburst from the neutron star low-mass X-ray binary MAXI J1816-195 over a period of 28 days. From a timing analysis of the 528.6 Hz pulsations, we find that the binary system is well described as a circular orbit with an orbital period of 4.8 hours and a projected semi-major axis of 0.26 light-seconds for the pulsar, which constrains the mass of the donor star to $0.10-0.55 M_\odot$. Additionally, we observed 15 thermonuclear X-ray bursts showing a gradual evolution in morphology over time, and a recurrence time as short as 1.4 hours. We did not detect evidence for photospheric radius expansion, placing an upper limit on the source distance of 8.6 kpc.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment
Authors:
Yilei Zeng,
Jiali Duan,
Yang Li,
Emilio Ferrara,
Lerrel Pinto,
C. -C. Jay Kuo,
Stefanos Nikolaidis
Abstract:
Human-centered AI considers human experiences with AI performance. While abundant research has been hel** AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferr…
▽ More
Human-centered AI considers human experiences with AI performance. While abundant research has been hel** AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum feedback. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications that require millions of samples without a server. The result demonstrates the effectiveness of an interactive curriculum for reinforcement learning involving human-in-the-loop. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level. We believe this research will open new doors for achieving flow and personalized adaptive difficulties.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
GCN-WP -- Semi-Supervised Graph Convolutional Networks for Win Prediction in Esports
Authors:
Alexander J. Bisberg,
Emilio Ferrara
Abstract:
Win prediction is crucial to understanding skill modeling, teamwork and matchmaking in esports. In this paper we propose GCN-WP, a semi-supervised win prediction model for esports based on graph convolutional networks. This model learns the structure of an esports league over the course of a season (1 year) and makes predictions on another similar league. This model integrates over 30 features abo…
▽ More
Win prediction is crucial to understanding skill modeling, teamwork and matchmaking in esports. In this paper we propose GCN-WP, a semi-supervised win prediction model for esports based on graph convolutional networks. This model learns the structure of an esports league over the course of a season (1 year) and makes predictions on another similar league. This model integrates over 30 features about the match and players and employs graph convolution to classify games based on their neighborhood. Our model achieves state-of-the-art prediction accuracy when compared to machine learning or skill rating models for LoL. The framework is generalizable so it can easily be extended to other multiplayer online games.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
What are Your Pronouns? Examining Gender Pronoun Usage on Twitter
Authors:
Julie Jiang,
Emily Chen,
Luca Luceri,
Goran Murić,
Francesco Pierri,
Ho-Chun Herbert Chang,
Emilio Ferrara
Abstract:
Stating your gender pronouns, along with your name, is becoming the new norm of self-introductions at school, at the workplace, and online. The increasing prevalence and awareness of nonconforming gender identities put discussions of develo** gender-inclusive language at the forefront. This work presents the first empirical research on gender pronoun usage on large-scale social media. Leveraging…
▽ More
Stating your gender pronouns, along with your name, is becoming the new norm of self-introductions at school, at the workplace, and online. The increasing prevalence and awareness of nonconforming gender identities put discussions of develo** gender-inclusive language at the forefront. This work presents the first empirical research on gender pronoun usage on large-scale social media. Leveraging a Twitter dataset of over 2 billion tweets collected continuously over two years, we find that the public declaration of gender pronouns is on the rise, with most people declaring as using she series pronouns, followed by he series pronouns, and a smaller but considerable amount of non-binary pronouns. From analyzing Twitter posts and sharing activities, we can discern users who use gender pronouns from those who do not and also distinguish users of various gender identities. We further illustrate the relationship between explicit forms of social network exposure to gender pronouns and their eventual gender pronoun adoption. This work carries crucial implications for gender-identity studies and initiates new research directions in gender-related fairness and inclusion, as well as support against online harassment and discrimination on social media.
△ Less
Submitted 27 October, 2023; v1 submitted 22 July, 2022;
originally announced July 2022.
-
Geolocated Social Media Posts are Happier: Understanding the Characteristics of Check-in Posts on Twitter
Authors:
Julie Jiang,
Jesse Thomason,
Francesco Barbieri,
Emilio Ferrara
Abstract:
The increasing prevalence of location-sharing features on social media has enabled researchers to ground computational social science research using geolocated data, affording opportunities to study human mobility, the impact of real-world events, and more. This paper analyzes what crucially separates posts with geotags from those without. We find that users who share location are not representati…
▽ More
The increasing prevalence of location-sharing features on social media has enabled researchers to ground computational social science research using geolocated data, affording opportunities to study human mobility, the impact of real-world events, and more. This paper analyzes what crucially separates posts with geotags from those without. We find that users who share location are not representative of the social media user population at large, jeopardizing the generalizability of research that uses only geolocated data.We consider three aspects: affect -- sentiment and emotions, content -- textual and non-textual, and audience engagement. By comparing a dataset of 1.3 million geotagged tweets with a random dataset of the same size, we show that geotagged posts on Twitter exhibit significantly more positivity, are often about joyous and special events such as weddings or graduations, convey more collectivism rather than individualism, and contain more additional features such as hashtags or objects in images, but at the same time generate substantially less engagement. These findings suggest there exist significant differences in the messages conveyed in geotagged posts. Our research carries important implications for future research utilizing geolocation social media data.
△ Less
Submitted 13 February, 2023; v1 submitted 22 July, 2022;
originally announced July 2022.
-
The Gift That Keeps on Giving: Generosity is Contagious in Multiplayer Online Games
Authors:
Alexander J. Bisberg,
Julie Jiang,
Yilei Zeng,
Emily Chen,
Emilio Ferrara
Abstract:
Understanding social interactions and generous behaviors have long been of considerable interest in the social science community. While the contagion of generosity is documented in the real world, less is known about such phenomenon in virtual worlds and whether it has an actionable impact on user behavior and retention. In this work, we analyze social dynamics in the virtual world of the popular…
▽ More
Understanding social interactions and generous behaviors have long been of considerable interest in the social science community. While the contagion of generosity is documented in the real world, less is known about such phenomenon in virtual worlds and whether it has an actionable impact on user behavior and retention. In this work, we analyze social dynamics in the virtual world of the popular massively multiplayer online role-playing game (MMORPG) Sky: Children of Light. We develop a framework to reveal the patterns of generosity in such social environments and provide empirical evidence of social contagion and contagious generosity. Players become more engaged in the game after playing with others and especially with friends. We also find that players who experience generosity first-hand or even observe other players conduct generous acts become more generous themselves in the future. Additionally, we show that both receiving and observing generosity lead to higher future engagement in the game. Since Sky resembles the real world from a social play aspect, the implications of our findings also go beyond this virtual world.
△ Less
Submitted 12 October, 2022; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Retweet-BERT: Political Leaning Detection Using Language Features and Information Diffusion on Social Networks
Authors:
Julie Jiang,
Xiang Ren,
Emilio Ferrara
Abstract:
Estimating the political leanings of social media users is a challenging and ever more pressing problem given the increase in social media consumption. We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users. Retweet-BERT leverages the retweet network structure and the language used in users' profile descriptions. Our assumptions stem from pattern…
▽ More
Estimating the political leanings of social media users is a challenging and ever more pressing problem given the increase in social media consumption. We introduce Retweet-BERT, a simple and scalable model to estimate the political leanings of Twitter users. Retweet-BERT leverages the retweet network structure and the language used in users' profile descriptions. Our assumptions stem from patterns of networks and linguistics homophily among people who share similar ideologies. Retweet-BERT demonstrates competitive performance against other state-of-the-art baselines, achieving 96%-97% macro-F1 on two recent Twitter datasets (a COVID-19 dataset and a 2020 United States presidential elections dataset). We also perform manual validation to validate the performance of Retweet-BERT on users not in the training data. Finally, in a case study of COVID-19, we illustrate the presence of political echo chambers on Twitter and show that it exists primarily among right-leaning users. Our code is open-sourced and our data is publicly available.
△ Less
Submitted 6 April, 2023; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Word Embedding for Social Sciences: An Interdisciplinary Survey
Authors:
Akira Matsui,
Emilio Ferrara
Abstract:
To extract essential information from complex data, computer scientists have been develo** machine learning models that learn low-dimensional representation mode. From such advances in machine learning research, not only computer scientists but also social scientists have benefited and advanced their research because human behavior or social phenomena lies in complex data. However, this emerging…
▽ More
To extract essential information from complex data, computer scientists have been develo** machine learning models that learn low-dimensional representation mode. From such advances in machine learning research, not only computer scientists but also social scientists have benefited and advanced their research because human behavior or social phenomena lies in complex data. However, this emerging trend is not well documented because different social science fields rarely cover each other's work, resulting in fragmented knowledge in the literature. To document this emerging trend, we survey recent studies that apply word embedding techniques to human behavior mining. We built a taxonomy to illustrate the methods and procedures used in the surveyed papers, aiding social science researchers in contextualizing their research within the literature on word embedding applications. This survey also conducts a simple experiment to warn that common similarity measurements used in the literature could yield different results even if they return consistent results at an aggregate level.
△ Less
Submitted 15 June, 2024; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Extracting Fast and Slow: User-Action Embedding with Inter-temporal Information
Authors:
Akira Matsui,
Emilio Ferrara
Abstract:
With the recent development of technology, data on detailed human temporal behaviors has become available. Many methods have been proposed to mine those human dynamic behavior data and revealed valuable insights for research and businesses. However, most methods analyze only sequence of actions and do not study the inter-temporal information such as the time intervals between actions in a holistic…
▽ More
With the recent development of technology, data on detailed human temporal behaviors has become available. Many methods have been proposed to mine those human dynamic behavior data and revealed valuable insights for research and businesses. However, most methods analyze only sequence of actions and do not study the inter-temporal information such as the time intervals between actions in a holistic manner. While actions and action time intervals are interdependent, it is challenging to integrate them because they have different natures: time and action. To overcome this challenge, we propose a unified method that analyzes user actions with intertemporal information (time interval). We simultaneously embed the user's action sequence and its time intervals to obtain a low-dimensional representation of the action along with intertemporal information. The paper demonstrates that the proposed method enables us to characterize user actions in terms of temporal context, using three real-world data sets. This paper demonstrates that explicit modeling of action sequences and inter-temporal user behavior information enable successful interpretable analysis.
△ Less
Submitted 19 June, 2022;
originally announced June 2022.
-
Individual and Collective Performance Deteriorate in a New Team: A Case Study of CS:GO Tournaments
Authors:
Weiwei Zhang,
Goran Muric,
Emilio Ferrara
Abstract:
How does the team formation relates to team performance in professional video game playing? This study examined one aspect of group dynamics - team switching - and aims to answer how changing a team affects individual and collective performance in eSports tournaments. In this study we test the hypothesis that switching teams can be detrimental to individual and team performance both in short term…
▽ More
How does the team formation relates to team performance in professional video game playing? This study examined one aspect of group dynamics - team switching - and aims to answer how changing a team affects individual and collective performance in eSports tournaments. In this study we test the hypothesis that switching teams can be detrimental to individual and team performance both in short term and in a long run. We collected data from professional tournaments of a popular first-person shooter game {\itshape Counter-Strike: Global Offensive (CS:GO)} and perform two natural experiments. We found that the player's performance was inversely correlated with the number of teams a player had joined. After a player switched to a new team, both the individual and the collective performance dropped initially, and then slowly recovered. The findings in this study can provide insights for understanding group dynamics in eSports team play and eventually emphasize the importance of team cohesion in facilitating team collaboration, coordination, and knowledge sharing in teamwork in general.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
A Gamma-ray Pulsar Timing Array Constrains the Nanohertz Gravitational Wave Background
Authors:
M. Ajello,
W. B. Atwood,
L. Baldini,
J. Ballet,
G. Barbiellini,
D. Bastieri,
R. Bellazzini,
A. Berretta,
B. Bhattacharyya,
E. Bissaldi,
R. D. Blandford,
E. Bloom,
R. Bonino,
P. Bruel,
R. Buehler,
E. Burns,
S. Buson,
R. A. Cameron,
P. A. Caraveo,
E. Cavazzuti,
N. Cibrario,
S. Ciprini,
C. J. Clark,
I. Cognard,
J. Coronado-Blázquez
, et al. (107 additional authors not shown)
Abstract:
After large galaxies merge, their central supermassive black holes are expected to form binary systems whose orbital motion generates a gravitational wave background (GWB) at nanohertz frequencies. Searches for this background utilize pulsar timing arrays, which perform long-term monitoring of millisecond pulsars (MSPs) at radio wavelengths. We use 12.5 years of Fermi Large Area Telescope data to…
▽ More
After large galaxies merge, their central supermassive black holes are expected to form binary systems whose orbital motion generates a gravitational wave background (GWB) at nanohertz frequencies. Searches for this background utilize pulsar timing arrays, which perform long-term monitoring of millisecond pulsars (MSPs) at radio wavelengths. We use 12.5 years of Fermi Large Area Telescope data to form a gamma-ray pulsar timing array. Results from 35 bright gamma-ray pulsars place a 95\% credible limit on the GWB characteristic strain of $1.0\times10^{-14}$ at 1 yr$^{-1}$, which scales as the observing time span $t_{\mathrm{obs}}^{-13/6}$. This direct measurement provides an independent probe of the GWB while offering a check on radio noise models.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Zero-shot meta-learning for small-scale data from human subjects
Authors:
Julie Jiang,
Kristina Lerman,
Emilio Ferrara
Abstract:
While developments in machine learning led to impressive performance gains on big data, many human subjects data are, in actuality, small and sparsely labeled. Existing methods applied to such data often do not easily generalize to out-of-sample subjects. Instead, models must make predictions on test data that may be drawn from a different distribution, a problem known as \textit{zero-shot learnin…
▽ More
While developments in machine learning led to impressive performance gains on big data, many human subjects data are, in actuality, small and sparsely labeled. Existing methods applied to such data often do not easily generalize to out-of-sample subjects. Instead, models must make predictions on test data that may be drawn from a different distribution, a problem known as \textit{zero-shot learning}. To address this challenge, we develop an end-to-end framework using a meta-learning approach, which enables the model to rapidly adapt to a new prediction task with limited training data for out-of-sample test data. We use three real-world small-scale human subjects datasets (two randomized control studies and one observational study), for which we predict treatment outcomes for held-out treatment groups. Our model learns the latent treatment effects of each intervention and, by design, can naturally handle multi-task predictions. We show that our model performs the best holistically for each held-out group and especially when the test group is distinctly different from the training group. Our model has implications for improved generalization of small-size human studies to the wider population.
△ Less
Submitted 1 April, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Tweets in Time of Conflict: A Public Dataset Tracking the Twitter Discourse on the War Between Ukraine and Russia
Authors:
Emily Chen,
Emilio Ferrara
Abstract:
On February 24, 2022, Russia invaded Ukraine. In the days that followed, reports kept flooding in from layman to news anchors of a conflict quickly escalating into war. Russia faced immediate backlash and condemnation from the world at large. While the war continues to contribute to an ongoing humanitarian and refugee crisis in Ukraine, a second battlefield has emerged in the online space, both in…
▽ More
On February 24, 2022, Russia invaded Ukraine. In the days that followed, reports kept flooding in from layman to news anchors of a conflict quickly escalating into war. Russia faced immediate backlash and condemnation from the world at large. While the war continues to contribute to an ongoing humanitarian and refugee crisis in Ukraine, a second battlefield has emerged in the online space, both in the use of social media to garner support for both sides of the conflict and also in the context of information warfare. In this paper, we present a collection of over 63 million tweets, from February 22, 2022 through March 8, 2022 that we are publishing for the wider research community to use. This dataset can be found at https://github.com/echen102/ukraine-russia and will be maintained and regularly updated as the war continues to unfold. Our preliminary analysis already shows evidence of public engagement with Russian state sponsored media and other domains that are known to push unreliable information; the former saw a spike in activity on the day of the Russian invasion. Our hope is that this public dataset can help the research community to further understand the ever evolving role that social media plays in information dissemination, influence campaigns, grassroots mobilization, and much more, during a time of conflict.
△ Less
Submitted 10 April, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
Construction of Large-Scale Misinformation Labeled Datasets from Social Media Discourse using Label Refinement
Authors:
Karishma Sharma,
Emilio Ferrara,
Yan Liu
Abstract:
Malicious accounts spreading misinformation has led to widespread false and misleading narratives in recent times, especially during the COVID-19 pandemic, and social media platforms struggle to eliminate these contents rapidly. This is because adapting to new domains requires human intensive fact-checking that is slow and difficult to scale. To address this challenge, we propose to leverage news-…
▽ More
Malicious accounts spreading misinformation has led to widespread false and misleading narratives in recent times, especially during the COVID-19 pandemic, and social media platforms struggle to eliminate these contents rapidly. This is because adapting to new domains requires human intensive fact-checking that is slow and difficult to scale. To address this challenge, we propose to leverage news-source credibility labels as weak labels for social media posts and propose model-guided refinement of labels to construct large-scale, diverse misinformation labeled datasets in new domains. The weak labels can be inaccurate at the article or social media post level where the stance of the user does not align with the news source or article credibility. We propose a framework to use a detection model self-trained on the initial weak labels with uncertainty sampling based on entropy in predictions of the model to identify potentially inaccurate labels and correct for them using self-supervision or relabeling. The framework will incorporate social context of the post in terms of the community of its associated user for surfacing inaccurate labels towards building a large-scale dataset with minimum human effort. To provide labeled datasets with distinction of misleading narratives where information might be missing significant context or has inaccurate ancillary details, the proposed framework will use the few labeled samples as class prototypes to separate high confidence samples into false, unproven, mixture, mostly false, mostly true, true, and debunk information. The approach is demonstrated for providing a large-scale misinformation dataset on COVID-19 vaccines.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Discovery, Timing, and Multiwavelength Observations of the Black Widow Millisecond Pulsar PSR J1555-2908
Authors:
Paul S. Ray,
Lars Nieder,
Colin J. Clark,
Scott M. Ransom,
H. Thankful Cromartie,
Dale A. Frail,
Kunal P. Mooley,
Huib Intema,
Preshanth Jagannathan,
Paul Demorest,
Kevin Stovall,
Jules P. Halpern,
Julia Deneva,
Sebastien Guillot,
Matthew Kerr,
Samuel J. Swihart,
Philippe Bruel,
Ben W. Stappers,
Andrew Lyne,
Mitch Mickaliger,
Fernando Camilo,
Elizabeth C. Ferrara,
Michael T. Wolff,
P. F. Michelson
Abstract:
We report the discovery of PSR J1555-2908, a 1.79 ms radio and gamma-ray pulsar in a 5.6 hr binary system with a minimum companion mass of 0.052 $M_\odot$. This fast and energetic ($\dot E = 3 \times 10^{35}$ erg/s) millisecond pulsar was first detected as a gamma-ray point source in Fermi LAT sky survey observations. Guided by a steep spectrum radio point source in the Fermi error region, we perf…
▽ More
We report the discovery of PSR J1555-2908, a 1.79 ms radio and gamma-ray pulsar in a 5.6 hr binary system with a minimum companion mass of 0.052 $M_\odot$. This fast and energetic ($\dot E = 3 \times 10^{35}$ erg/s) millisecond pulsar was first detected as a gamma-ray point source in Fermi LAT sky survey observations. Guided by a steep spectrum radio point source in the Fermi error region, we performed a search at 820 MHz with the Green Bank Telescope that first discovered the pulsations. The initial radio pulse timing observations provided enough information to seed a search for gamma-ray pulsations in the LAT data, from which we derive a timing solution valid for the full Fermi mission. In addition to the radio and gamma-ray pulsation discovery and timing, we searched for X-ray pulsations using NICER but no significant pulsations were detected. We also obtained time-series r-band photometry that indicates strong heating of the companion star by the pulsar wind. Material blown off the heated companion eclipses the 820 MHz radio pulse during inferior conjunction of the companion for ~10% of the orbit, which is twice the angle subtended by its Roche lobe in an edge-on system.
△ Less
Submitted 9 February, 2022;
originally announced February 2022.
-
Incremental Fermi Large Area Telescope Fourth Source Catalog
Authors:
Fermi-LAT collaboration,
:,
Soheila Abdollahi,
Fabio Acero,
Luca Baldini,
Jean Ballet,
Denis Bastieri,
Ronaldo Bellazzini,
Bijan Berenji,
Alessandra Berretta,
Elisabetta Bissaldi,
Roger D. Blandford,
Elliott Bloom,
Raffaella Bonino,
Ari Brill,
Richard J. Britto,
Philippe Bruel,
Toby H. Burnett,
Sara Buson,
Rob A. Cameron,
Regina Caputo,
Patrizia A. Caraveo,
Daniel Castro,
Sylvain Chaty,
Teddy C. Cheung
, et al. (116 additional authors not shown)
Abstract:
We present an incremental version (4FGL-DR3, for Data Release 3) of the fourth Fermi-LAT catalog of gamma-ray sources. Based on the first twelve years of science data in the energy range from 50 MeV to 1 TeV, it contains 6658 sources. The analysis improves on that used for the 4FGL catalog over eight years of data: more sources are fit with curved spectra, we introduce a more robust spectral param…
▽ More
We present an incremental version (4FGL-DR3, for Data Release 3) of the fourth Fermi-LAT catalog of gamma-ray sources. Based on the first twelve years of science data in the energy range from 50 MeV to 1 TeV, it contains 6658 sources. The analysis improves on that used for the 4FGL catalog over eight years of data: more sources are fit with curved spectra, we introduce a more robust spectral parameterization for pulsars, and we extend the spectral points to 1 TeV. The spectral parameters, spectral energy distributions, and associations are updated for all sources. Light curves are rebuilt for all sources with 1 yr intervals (not 2 month intervals). Among the 5064 original 4FGL sources, 16 were deleted, 112 are formally below the detection threshold over 12 yr (but are kept in the list), while 74 are newly associated, 10 have an improved association, and seven associations were withdrawn. Pulsars are split explicitly between young and millisecond pulsars. Pulsars and binaries newly detected in LAT sources, as well as more than 100 newly classified blazars, are reported. We add three extended sources and 1607 new point sources, mostly just above the detection threshold, among which eight are considered identified, and 699 have a plausible counterpart at other wavelengths. We discuss degree-scale residuals to the global sky model and clusters of soft unassociated point sources close to the Galactic plane, which are possibly related to limitations of the interstellar emission model and missing extended sources.
△ Less
Submitted 10 May, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
The International Pulsar Timing Array second data release: Search for an isotropic Gravitational Wave Background
Authors:
J. Antoniadis,
Z. Arzoumanian,
S. Babak,
M. Bailes,
A. -S. Bak Nielsen,
P. T. Baker,
C. G. Bassa,
B. Becsy,
A. Berthereau,
M. Bonetti,
A. Brazier,
P. R. Brook,
M. Burgay,
S. Burke-Spolaor,
R. N. Caballero,
J. A. Casey-Clyde,
A. Chalumeau,
D. J. Champion,
M. Charisi,
S. Chatterjee,
S. Chen,
I. Cognard,
J. M. Cordes,
N. J. Cornish,
F. Crawford
, et al. (101 additional authors not shown)
Abstract:
We searched for an isotropic stochastic gravitational wave background in the second data release of the International Pulsar Timing Array, a global collaboration synthesizing decadal-length pulsar-timing campaigns in North America, Europe, and Australia. In our reference search for a power law strain spectrum of the form $h_c = A(f/1\,\mathrm{yr}^{-1})^α$, we found strong evidence for a spectrally…
▽ More
We searched for an isotropic stochastic gravitational wave background in the second data release of the International Pulsar Timing Array, a global collaboration synthesizing decadal-length pulsar-timing campaigns in North America, Europe, and Australia. In our reference search for a power law strain spectrum of the form $h_c = A(f/1\,\mathrm{yr}^{-1})^α$, we found strong evidence for a spectrally-similar low-frequency stochastic process of amplitude $A = 3.8^{+6.3}_{-2.5}\times10^{-15}$ and spectral index $α= -0.5 \pm 0.5$, where the uncertainties represent 95\% credible regions, using information from the auto- and cross-correlation terms between the pulsars in the array. For a spectral index of $α= -2/3$, as expected from a population of inspiralling supermassive black hole binaries, the recovered amplitude is $A = 2.8^{+1.2}_{-0.8}\times10^{-15}$. Nonetheless, no significant evidence of the Hellings-Downs correlations that would indicate a gravitational-wave origin was found. We also analyzed the constituent data from the individual pulsar timing arrays in a consistent way, and clearly demonstrate that the combined international data set is more sensitive. Furthermore, we demonstrate that this combined data set produces comparable constraints to recent single-array data sets which have more data than the constituent parts of the combination. Future international data releases will deliver increased sensitivity to gravitational wave radiation, and significantly increase the detection probability.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
4FGL J1120.0-2204: A Unique Gamma-ray Bright Neutron Star Binary with an Extremely Low Mass Proto-White Dwarf
Authors:
Samuel J. Swihart,
Jay Strader,
Elias Aydi,
Laura Chomiuk,
Kristen C. Dage,
Adam Kawash,
Kirill V. Sokolovsky,
Elizabeth C. Ferrara
Abstract:
We have discovered a new X-ray emitting compact binary that is the likely counterpart to the unassociated Fermi-LAT GeV $γ$-ray source 4FGL J1120.0-2204, the second brightest Fermi source that still remains formally unidentified. Using optical spectroscopy with the SOAR telescope, we have identified a warm ($T_{\textrm{eff}}\sim8500$ K) companion in a 15.1-hr orbit around an unseen primary, which…
▽ More
We have discovered a new X-ray emitting compact binary that is the likely counterpart to the unassociated Fermi-LAT GeV $γ$-ray source 4FGL J1120.0-2204, the second brightest Fermi source that still remains formally unidentified. Using optical spectroscopy with the SOAR telescope, we have identified a warm ($T_{\textrm{eff}}\sim8500$ K) companion in a 15.1-hr orbit around an unseen primary, which is likely a yet-undiscovered millisecond pulsar. A precise Gaia parallax shows the binary is nearby, at a distance of only $\sim 820$ pc. Unlike the typical "spider" or white dwarf secondaries in short-period millisecond pulsar binaries, our observations suggest the $\sim 0.17\,M_{\odot}$ companion is in an intermediate stage, contracting on the way to becoming an extremely low-mass helium white dwarf (a "pre-ELM" white dwarf). Although the companion is apparently unique among confirmed or candidate millisecond pulsar binaries, we use binary evolution models to show that in $\sim 2$ Gyr, the properties of the binary will match those of several millisecond pulsar-white dwarf binaries with very short ($< 1$ d) orbital periods. This makes 4FGL J1120.0-2204 the first system discovered in the penultimate phase of the millisecond pulsar recycling process.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Botometer 101: Social bot practicum for computational social scientists
Authors:
Kai-Cheng Yang,
Emilio Ferrara,
Filippo Menczer
Abstract:
Social bots have become an important component of online social media. Deceptive bots, in particular, can manipulate online discussions of important issues ranging from elections to public health, threatening the constructive exchange of information. Their ubiquity makes them an interesting research subject and requires researchers to properly handle them when conducting studies using social media…
▽ More
Social bots have become an important component of online social media. Deceptive bots, in particular, can manipulate online discussions of important issues ranging from elections to public health, threatening the constructive exchange of information. Their ubiquity makes them an interesting research subject and requires researchers to properly handle them when conducting studies using social media data. Therefore, it is important for researchers to gain access to bot detection tools that are reliable and easy to use. This paper aims to provide an introductory tutorial of Botometer, a public tool for bot detection on Twitter, for readers who are new to this topic and may not be familiar with programming and machine learning. We introduce how Botometer works, the different ways users can access it, and present a case study as a demonstration. Readers can use the case study code as a template for their own research. We also discuss recommended practice for using Botometer.
△ Less
Submitted 21 August, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Bayesian Solar Wind Modeling with Pulsar Timing Arrays
Authors:
Jeffrey S. Hazboun,
Joseph Simon,
Dustin R. Madison,
Zaven Arzoumanian,
Kathryn Crowter,
Megan E. DeCesar,
Paul B. Demorest,
Timothy Dolch,
Justin A. Ellis,
Robert D. Ferdman,
Elizabeth C. Ferrara,
Emmanuel Fonseca,
Peter A. Gentile,
Glenn Jones,
Megan L. Jones,
Michael T. Lam,
Lina Levin,
Duncan R. Lorimer,
Ryan S. Lynch,
Maura A. McLaughlin,
Cherry Ng,
David J. Nice,
Timothy T. Pennucci,
Scott M. Ransom,
Paul S. Ray
, et al. (5 additional authors not shown)
Abstract:
Using Bayesian analyses we study the solar electron density with the NANOGrav 11-year pulsar timing array (PTA) dataset. Our model of the solar wind is incorporated into a global fit starting from pulse times-of-arrival. We introduce new tools developed for this global fit, including analytic expressions for solar electron column densities and open source models for the solar wind that port into e…
▽ More
Using Bayesian analyses we study the solar electron density with the NANOGrav 11-year pulsar timing array (PTA) dataset. Our model of the solar wind is incorporated into a global fit starting from pulse times-of-arrival. We introduce new tools developed for this global fit, including analytic expressions for solar electron column densities and open source models for the solar wind that port into existing PTA software. We perform an ab initio recovery of various solar wind model parameters. We then demonstrate the richness of information about the solar electron density, $n_E$, that can be gleaned from PTA data, including higher order corrections to the simple $1/r^2$ model associated with a free-streaming wind (which are informative probes of coronal acceleration physics), quarterly binned measurements of $n_E$ and a continuous time-varying model for $n_E$ spanning approximately one solar cycle period. Finally, we discuss the importance of our model for chromatic noise mitigation in gravitational-wave analyses of pulsar timing data and the potential of develo** synergies between sophisticated PTA solar electron density models and those developed by the solar physics community.
△ Less
Submitted 17 November, 2021;
originally announced November 2021.
-
Heterogeneous Effects of Software Patches in a Multiplayer Online Battle Arena Game
Authors:
Yuzi He,
Christopher Tran,
Julie Jiang,
Keith Burghardt,
Emilio Ferrara,
Elena Zheleva,
Kristina Lerman
Abstract:
The popularity of online gaming has grown dramatically, driven in part by streaming and the billion-dollar e-sports industry. Online games regularly update their software to fix bugs, add functionality that improve the game's look and feel, and change the game mechanics to keep the games fun and challenging. An open question, however, is the impact of these changes on player performance and game b…
▽ More
The popularity of online gaming has grown dramatically, driven in part by streaming and the billion-dollar e-sports industry. Online games regularly update their software to fix bugs, add functionality that improve the game's look and feel, and change the game mechanics to keep the games fun and challenging. An open question, however, is the impact of these changes on player performance and game balance, as well as how players adapt to these sudden changes. To address these questions, we use causal inference to measure the impact of software patches to League of Legends, a popular team-based multiplayer online game. We show that game patches have substantially different impacts on players depending on their skill level and whether they take breaks between games. We find that the gap between good and bad players increases after a patch, despite efforts to make gameplay more equal. Moreover, longer between-game breaks tend to improve player performance after patches. Overall, our results highlight the utility of causal inference, and specifically heterogeneous treatment effect estimation, as a tool to quantify the complex mechanisms of game balance and its interplay with players' performance.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Parasocial diffusion: K-pop fandoms help drive COVID-19 public health messaging on social media
Authors:
Ho-Chun Herbert Chang,
Becky Pham,
Emilio Ferrara
Abstract:
We examine an unexpected but significant source of positive public health messaging during the COVID-19 pandemic -- K-pop fandoms. Leveraging more than 7 million tweets related to mask-wearing and K-pop between March 2020 and December 2021, we analyzed the online spread of the hashtag \#WearAMask and vaccine-related tweets amid anti-mask sentiments and public health misinformation. Analyses reveal…
▽ More
We examine an unexpected but significant source of positive public health messaging during the COVID-19 pandemic -- K-pop fandoms. Leveraging more than 7 million tweets related to mask-wearing and K-pop between March 2020 and December 2021, we analyzed the online spread of the hashtag \#WearAMask and vaccine-related tweets amid anti-mask sentiments and public health misinformation. Analyses reveal the South Korean boyband BTS as one of the most significant driver of health discourse. Tweets from health agencies and prominent figures that mentioned K-pop generate 111 times more online responses compared to tweets that did not. These tweets also elicited strong responses from South America, Southeast Asia, and rural States -- areas often neglected in Twitter-based messaging by mainstream social media campaigns. Network and temporal analysis show increased use from right-leaning elites over time. Mechanistically, strong-levels of parasocial engagement and connectedness allow sustained activism in the community. Our results suggest that public health institutions may leverage pre-existing audience markets to synergistically diffuse and target under-served communities both domestically and globally, especially during health crises such as COVID-19.
△ Less
Submitted 7 October, 2023; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Multiwavelength Spectral Analysis and Neural Network Classification of Counterparts to 4FGL Unassociated Sources
Authors:
Stephen Kerby,
Amanpreet Kaur,
Abraham D. Falcone,
Ryan Eskenasy,
Fredric Hancock,
Michael C. Stroh,
Elizabeth C. Ferrara,
Paul S. Ray,
Jamie A. Kennea,
Eric Grove
Abstract:
The Fermi-LAT unassociated sources represent some of the most enigmatic gamma-ray sources in the sky. Observations with the Swift-XRT and -UVOT telescopes have identified hundreds of likely X-ray and UV/optical counterparts in the uncertainty ellipses of the unassociated sources. In this work we present spectral fitting results for 205 possible X-ray/UV/optical counterparts to 4FGL unassociated ta…
▽ More
The Fermi-LAT unassociated sources represent some of the most enigmatic gamma-ray sources in the sky. Observations with the Swift-XRT and -UVOT telescopes have identified hundreds of likely X-ray and UV/optical counterparts in the uncertainty ellipses of the unassociated sources. In this work we present spectral fitting results for 205 possible X-ray/UV/optical counterparts to 4FGL unassociated targets. Assuming that the unassociated sources contain mostly pulsars and blazars, we develop a neural network classifier approach that applies gamma-ray, X-ray, and UV/optical spectral parameters to yield descriptive classification of unassociated spectra into pulsars and blazars. From our primary sample of 174 Fermi sources with a single X-ray/UV/optical counterpart, we present 132 P_bzr > 0.99 likely blazars and 14 P_bzr < 0.01 likely pulsars, with 28 remaining ambiguous. These subsets of the unassociated sources suggest a systematic expansion to catalogs of gamma-ray pulsars and blazars. Compared to previous classification approaches our neural network classifier achieves significantly higher validation accuracy and returns more bifurcated P_bzr values, suggesting that multiwavelength analysis is a valuable tool for confident classification of Fermi unassociated sources.
△ Less
Submitted 22 October, 2021; v1 submitted 8 October, 2021;
originally announced October 2021.
-
FairFed: Enabling Group Fairness in Federated Learning
Authors:
Yahya H. Ezzeldin,
Shen Yan,
Chaoyang He,
Emilio Ferrara,
Salman Avestimehr
Abstract:
Training ML models which are fair across different demographic groups is of critical importance due to the increased integration of ML in crucial decision-making scenarios such as healthcare and recruitment. Federated learning has been viewed as a promising solution for collaboratively training machine learning models among multiple parties while maintaining the privacy of their local data. Howeve…
▽ More
Training ML models which are fair across different demographic groups is of critical importance due to the increased integration of ML in crucial decision-making scenarios such as healthcare and recruitment. Federated learning has been viewed as a promising solution for collaboratively training machine learning models among multiple parties while maintaining the privacy of their local data. However, federated learning also poses new challenges in mitigating the potential bias against certain populations (e.g., demographic groups), as this typically requires centralized access to the sensitive information (e.g., race, gender) of each datapoint. Motivated by the importance and challenges of group fairness in federated learning, in this work, we propose FairFed, a novel algorithm for fairness-aware aggregation to enhance group fairness in federated learning. Our proposed approach is server-side and agnostic to the applied local debiasing thus allowing for flexible use of different local debiasing methods across clients. We evaluate FairFed empirically versus common baselines for fair ML and federated learning, and demonstrate that it provides fairer models particularly under highly heterogeneous data distributions across clients. We also demonstrate the benefits of FairFed in scenarios involving naturally distributed real-life data collected from different geographical locations or departments within an organization.
△ Less
Submitted 23 November, 2022; v1 submitted 2 October, 2021;
originally announced October 2021.
-
The NANOGrav 12.5-year data set: Search for Non-Einsteinian Polarization Modes in theGravitational-Wave Background
Authors:
Zaven Arzoumanian,
Paul T. Baker,
Harsha Blumer,
Bence Becsy,
Adam Brazier,
Paul R. Brook,
Sarah Burke-Spolaor,
Maria Charisi,
Shami Chatterjee,
Siyuan Chen,
James M. Cordes,
Neil J. Cornish,
Fronefield Crawford,
H. Thankful Cromartie,
Megan E. DeCesar,
Dallas M. DeGan,
Paul B. Demorest,
Timothy Dolch,
Brendan Drachler,
Justin A. Ellis,
Elizabeth C. Ferrara,
William Fiore,
Emmanuel Fonseca,
Nathan Garver-Daniels,
Peter A. Gentile
, et al. (46 additional authors not shown)
Abstract:
We search NANOGrav's 12.5-year data set for evidence of a gravitational wave background (GWB) with all the spatial correlations allowed by general metric theories of gravity. We find no substantial evidence in favor of the existence of such correlations in our data. We find that scalar-transverse (ST) correlations yield signal-to-noise ratios and Bayes factors that are higher than quadrupolar (ten…
▽ More
We search NANOGrav's 12.5-year data set for evidence of a gravitational wave background (GWB) with all the spatial correlations allowed by general metric theories of gravity. We find no substantial evidence in favor of the existence of such correlations in our data. We find that scalar-transverse (ST) correlations yield signal-to-noise ratios and Bayes factors that are higher than quadrupolar (tensor transverse, TT) correlations. Specifically, we find ST correlations with a signal-to-noise ratio of 2.8 that are preferred over TT correlations (Hellings and Downs correlations) with Bayesian odds of about 20:1. However, the significance of ST correlations is reduced dramatically when we include modeling of the Solar System ephemeris systematics and/or remove pulsar J0030$+$0451 entirely from consideration. Even taking the nominal signal-to-noise ratios at face value, analyses of simulated data sets show that such values are not extremely unlikely to be observed in cases where only the usual TT modes are present in the GWB. In the absence of a detection of any polarization mode of gravity, we place upper limits on their amplitudes for a spectral index of $γ= 5$ and a reference frequency of $f_\text{yr} = 1 \text{yr}^{-1}$. Among the upper limits for eight general families of metric theories of gravity, we find the values of $A^{95\%}_{TT} = (9.7 \pm 0.4)\times 10^{-16}$ and $A^{95\%}_{ST} = (1.4 \pm 0.03)\times 10^{-15}$ for the family of metric spacetime theories that contain both TT and ST modes.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
FAST discovery of an extremely radio-faint millisecond pulsar from the Fermi-LAT unassociated source 3FGL J0318.1+0252
Authors:
Pei Wang,
Di Li,
Colin J. Clark,
Pablo Saz Parkinson,
Xian Hou,
Weiwei Zhu,
Lei Qian,
Youling Yue,
Zhichen Pan,
Zhijie Liu,
Xuhong Yu,
Xiaoyao Xie,
Qijun Zhi,
Hui Zhang,
Jumei Yao,
Jun Yan,
Chengmin Zhang,
Paul S. Ray,
Matthew Kerr,
David A. Smith,
Peter F. Michelson,
Elizabeth C. Ferrara,
David J. Thompson,
Zhiqiang Shen,
Na Wang
, et al. (1 additional authors not shown)
Abstract:
High sensitivity radio searches of unassociated $γ$-ray sources have proven to be an effective way of finding new pulsars. Using the Five-hundred-meter Aperture Spherical radio Telescope (FAST) during its commissioning phase, we have carried out a number of targeted deep searches of \textit{Fermi} Large Area Telescope (LAT) $γ$-ray sources. On Feb. 27$^{th}$, 2018 we discovered an isolated millise…
▽ More
High sensitivity radio searches of unassociated $γ$-ray sources have proven to be an effective way of finding new pulsars. Using the Five-hundred-meter Aperture Spherical radio Telescope (FAST) during its commissioning phase, we have carried out a number of targeted deep searches of \textit{Fermi} Large Area Telescope (LAT) $γ$-ray sources. On Feb. 27$^{th}$, 2018 we discovered an isolated millisecond pulsar (MSP), PSR J0318+0253, coincident with the unassociated $γ$-ray source 3FGL J0318.1+0252. PSR J0318+0253 has a spin period of $5.19$ milliseconds, a dispersion measure (DM) of $26$ pc cm$^{-3}$ corresponding to a DM distance of about $1.3$ kpc, and a period-averaged flux density of $\sim$11 $\pm$ 2 $μ$Jy at L-band (1.05-1.45 GHz). Among all high energy MSPs, PSR J0318+0253 is the faintest ever detected in radio bands, by a factor of at least $\sim$4 in terms of L-band fluxes. With the aid of the radio ephemeris, an analysis of 9.6 years of \textit{Fermi}-LAT data revealed that PSR J0318+0253 also displays strong $γ$-ray pulsations. Follow-up observations carried out by both Arecibo and FAST suggest a likely spectral turn-over around 350 MHz. This is the first result from the collaboration between FAST and the \textit{Fermi}-LAT teams as well as the first confirmed new MSP discovery by FAST, raising hopes for the detection of many more MSPs. Such discoveries will make a significant contribution to our understanding of the neutron star zoo while potentially contributing to the future detection of gravitational waves, via pulsar timing array (PTA) experiments.
△ Less
Submitted 3 September, 2021; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Characterizing Online Engagement with Disinformation and Conspiracies in the 2020 U.S. Presidential Election
Authors:
Karishma Sharma,
Emilio Ferrara,
Yan Liu
Abstract:
Identifying and characterizing disinformation in political discourse on social media is critical to ensure the integrity of elections and democratic processes around the world. Persistent manipulation of social media has resulted in increased concerns regarding the 2020 U.S. Presidential Election, due to its potential to influence individual opinions and social dynamics. In this work, we focus on…
▽ More
Identifying and characterizing disinformation in political discourse on social media is critical to ensure the integrity of elections and democratic processes around the world. Persistent manipulation of social media has resulted in increased concerns regarding the 2020 U.S. Presidential Election, due to its potential to influence individual opinions and social dynamics. In this work, we focus on the identification of distorted facts, in the form of unreliable and conspiratorial narratives in election-related tweets, to characterize discourse manipulation prior to the election. We apply a detection model to separate factual from unreliable (or conspiratorial) claims analyzing a dataset of 242 million election-related tweets. The identified claims are used to investigate targeted topics of disinformation, and conspiracy groups, most notably the far-right QAnon conspiracy group. Further, we characterize account engagements with unreliable and conspiracy tweets, and with the QAnon conspiracy group, by political leaning and tweet types. Finally, using a regression discontinuity design, we investigate whether Twitter's actions to curb QAnon activity on the platform were effective, and how QAnon accounts adapt to Twitter's restrictions.
△ Less
Submitted 20 October, 2021; v1 submitted 17 July, 2021;
originally announced July 2021.