Search | arXiv e-print repository

RIP Twitter API: A eulogy to its vast research contributions

Authors: Ryan Murtfeldt, Naomi Alterman, Ihsan Kahveci, Jevin D. West

Abstract: Since 2006, Twitter's Application Programming Interface (API) has been a treasure trove of high-quality data for researchers studying everything from the spread of misinformation, to social psychology and emergency management. However, in the spring of 2023, Twitter (now called X) began changing $42,000/month for its Enterprise access level, an essential death knell for researcher use. Lacking suf… ▽ More Since 2006, Twitter's Application Programming Interface (API) has been a treasure trove of high-quality data for researchers studying everything from the spread of misinformation, to social psychology and emergency management. However, in the spring of 2023, Twitter (now called X) began changing $42,000/month for its Enterprise access level, an essential death knell for researcher use. Lacking sufficient funds to pay this monthly fee, academics are now scrambling to continue their research without this important data source. This study collects and tabulates the number of studies, number of citations, dates, major disciplines, and major topic areas of studies that used Twitter data between 2006 and 2023. While we cannot know for certain what will be lost now that Twitter data is cost prohibitive, we can illustrate its research value during the time it was available. A search of 8 databases and 3 related APIs found that since 2006, a total of 27,453 studies have been published in 7,432 publication venues, with 1,303,142 citations, across 14 disciplines. Major disciplines include: computational social science, engineering, data science, social media studies, public health, and medicine. Major topics include: information dissemination, assessing the credibility of tweets, strategies for conducting data research, detecting and analyzing major events, and studying human behavior. Twitter data studies have increased every year since 2006, but following Twitter's decision to begin charging for data in the spring of 2023, the number of studies published in 2023 decreased by 13% compared to 2022. We assume that much of the data used for studies published in 2023 were collected prior to Twitter's shutdown, and thus the number of new studies are likely to decline further in subsequent years. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 16 pages, 4 figures, 4 appendices

ACM Class: J.4; K.4

arXiv:2404.06422 [pdf, other]

Echo Chambers in the Age of Algorithms: An Audit of Twitter's Friend Recommender System

Authors: Kayla Duskin, Joseph S. Schafer, Jevin D. West, Emma S. Spiro

Abstract: The presence of political misinformation and ideological echo chambers on social media platforms is concerning given the important role that these sites play in the public's exposure to news and current events. Algorithmic systems employed on these platforms are presumed to play a role in these phenomena, but little is known about their mechanisms and effects. In this work, we conduct an algorithm… ▽ More The presence of political misinformation and ideological echo chambers on social media platforms is concerning given the important role that these sites play in the public's exposure to news and current events. Algorithmic systems employed on these platforms are presumed to play a role in these phenomena, but little is known about their mechanisms and effects. In this work, we conduct an algorithmic audit of Twitter's Who-To-Follow friend recommendation system, the first empirical audit that investigates the impact of this algorithm in-situ. We create automated Twitter accounts that initially follow left and right affiliated U.S. politicians during the 2022 U.S. midterm elections and then grow their information networks using the platform's recommender system. We pair the experiment with an observational study of Twitter users who already follow the same politicians. Broadly, we find that while following the recommendation algorithm leads accounts into dense and reciprocal neighborhoods that structurally resemble echo chambers, the recommender also results in less political homogeneity of a user's network compared to accounts growing their networks through social endorsement. Furthermore, accounts that exclusively followed users recommended by the algorithm had fewer opportunities to encounter content centered on false or misleading election narratives compared to choosing friends based on social endorsement. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: To be published in ACM Web Science Conference (Websci 24). 11 pages, 6 figures

arXiv:2403.19717 [pdf, other]

A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

Authors: Jack West, Lea Thiemt, Shimaa Ahmed, Maggie Bartig, Kassem Fawaz, Suman Banerjee

Abstract: Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal… ▽ More Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 18 pages, 13 figures, to appear at IEEE Symposium on Security and Privacy 2024

ACM Class: K.4.2; C.4; D.2.2

arXiv:2403.14018 [pdf, other]

A Signal Injection Attack Against Zero Involvement Pairing and Authentication for the Internet of Things

Authors: Isaac Ahlgren, Jack West, Kyuin Lee, George Thiruvathukal, Neil Klingensmith

Abstract: Zero Involvement Pairing and Authentication (ZIPA) is a promising technique for autoprovisioning large networks of Internet-of-Things (IoT) devices. In this work, we present the first successful signal injection attack on a ZIPA system. Most existing ZIPA systems assume there is a negligible amount of influence from the unsecured outside space on the secured inside space. In reality, environmental… ▽ More Zero Involvement Pairing and Authentication (ZIPA) is a promising technique for autoprovisioning large networks of Internet-of-Things (IoT) devices. In this work, we present the first successful signal injection attack on a ZIPA system. Most existing ZIPA systems assume there is a negligible amount of influence from the unsecured outside space on the secured inside space. In reality, environmental signals do leak from adjacent unsecured spaces and influence the environment of the secured space. Our attack takes advantage of this fact to perform a signal injection attack on the popular Schurmann & Sigg algorithm. The keys generated by the adversary with a signal injection attack at 95 dBA is within the standard error of the legitimate device. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.11707 [pdf, other]

Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less Reliable

Authors: Shahan Ali Memon, Jevin D. West

Abstract: In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while… ▽ More In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 9 pages, 5 figures

arXiv:2312.03759 [pdf, ps, other]

How should the advent of large language models affect the practice of science?

Authors: Marcel Binz, Stephan Alaniz, Adina Roskies, Balazs Aczel, Carl T. Bergstrom, Colin Allen, Daniel Schad, Dirk Wulff, Jevin D. West, Qiong Zhang, Richard M. Shiffrin, Samuel J. Gershman, Ven Popov, Emily M. Bender, Marco Marelli, Matthew M. Botvinick, Zeynep Akata, Eric Schulz

Abstract: Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu… ▽ More Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.04433 [pdf, other]

SyncBleed: A Realistic Threat Model and Mitigation Strategy for Zero-Involvement Pairing and Authentication (ZIPA)

Authors: Isaac Ahlgren, Jack West, Kyuin Lee, George K. Thiruvathukal, Neil Klingensmith

Abstract: Zero Involvement Pairing and Authentication (ZIPA) is a promising technique for auto-provisioning large networks of Internet-of-Things (IoT) devices. Presently, these networks use password-based authentication, which is difficult to scale to more than a handful of devices. To deal with this challenge, ZIPA enabled devices autonomously extract identical authentication or encryption keys from ambien… ▽ More Zero Involvement Pairing and Authentication (ZIPA) is a promising technique for auto-provisioning large networks of Internet-of-Things (IoT) devices. Presently, these networks use password-based authentication, which is difficult to scale to more than a handful of devices. To deal with this challenge, ZIPA enabled devices autonomously extract identical authentication or encryption keys from ambient environmental signals. However, during the key negotiation process, existing ZIPA systems leak information on a public wireless channel which can allow adversaries to learn the key. We demonstrate a passive attack called SyncBleed, which uses leaked information to reconstruct keys generated by ZIPA systems. To mitigate SyncBleed, we present TREVOR, an improved key generation technique that produces nearly identical bit sequences from environmental signals without leaking information. We demonstrate that TREVOR can generate keys from a variety of environmental signal types under 4 seconds, consistently achieving a 90-95% bit agreement rate across devices within various environmental sources. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2308.16118 [pdf, other]

Response: Emergent analogical reasoning in large language models

Authors: Damian Hodel, Jevin West

Abstract: In their recent Nature Human Behaviour paper, "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023) the authors argue that "large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems." In this response, we provide counterexamples of the letter string analogies. In our tests, GPT-3 fails to… ▽ More In their recent Nature Human Behaviour paper, "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023) the authors argue that "large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems." In this response, we provide counterexamples of the letter string analogies. In our tests, GPT-3 fails to solve simplest variations of the original tasks, whereas human performance remains consistently high across all modified versions. Zero-shot reasoning is an extraordinary claim that requires extraordinary evidence. We do not see that evidence in our experiments. To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important that the field develop approaches that rule out data memorization. △ Less

Submitted 30 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Response to publication in Nature Human Behaviour titled "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023, arXiv:2212.09196). 14 pages

arXiv:2204.06128 [pdf, other]

Are You Really Muted?: A Privacy Analysis of Mute Buttons in Video Conferencing Apps

Authors: Yucheng Yang, Jack West, George K. Thiruvathukal, Neil Klingensmith, Kassem Fawaz

Abstract: Video conferencing apps (VCAs) make it possible for previously private spaces -- bedrooms, living rooms, and kitchens -- into semi-public extensions of the office. For the most part, users have accepted these apps in their personal space without much thought about the permission models that govern the use of their private data during meetings. While access to a device's video camera is carefully c… ▽ More Video conferencing apps (VCAs) make it possible for previously private spaces -- bedrooms, living rooms, and kitchens -- into semi-public extensions of the office. For the most part, users have accepted these apps in their personal space without much thought about the permission models that govern the use of their private data during meetings. While access to a device's video camera is carefully controlled, little has been done to ensure the same level of privacy for accessing the microphone. In this work, we ask the question: what happens to the microphone data when a user clicks the mute button in a VCA? We first conduct a user study to analyze users' understanding of the permission model of the mute button. Then, using runtime binary analysis tools, we trace raw audio flow in many popular VCAs as it traverses the app from the audio driver to the network. We find fragmented policies for dealing with microphone data among VCAs -- some continuously monitor the microphone input during mute, and others do so periodically. One app transmits statistics of the audio to its telemetry servers while the app is muted. Using network traffic that we intercept en route to the telemetry server, we implement a proof-of-concept background activity classifier and demonstrate the feasibility of inferring the ongoing background activity during a meeting -- cooking, cleaning, ty**, etc. We achieved 81.9% macro accuracy on identifying six common background activities using intercepted outgoing telemetry packets when a user is muted. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: to be published in the 22nd Privacy Enhancing Technologies Symposium (PETS 2022)

arXiv:2108.05669 [pdf, other]

Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery

Authors: Jason Portenoy, Marissa Radensky, Jevin West, Eric Horvitz, Daniel Weld, Tom Hope

Abstract: Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these informational "filter bubbles." In response, we describe Bridger, a system for facilitating discovery of scholars and their work. We construct a fac… ▽ More Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these informational "filter bubbles." In response, we describe Bridger, a system for facilitating discovery of scholars and their work. We construct a faceted representation of authors with information gleaned from their papers and inferred author personas, and use it to develop an approach that locates commonalities and contrasts between scientists to balance relevance and novelty. In studies with computer science researchers, this approach helps users discover authors considered useful for generating novel research directions. We also demonstrate an approach for displaying information about authors, boosting the ability to understand the work of new, unfamiliar scholars. Our analysis reveals that Bridger connects authors who have different citation profiles and publish in different venues, raising the prospect of bridging diverse scientific communities. △ Less

Submitted 31 January, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: CHI 2022

arXiv:2104.14618 [pdf, other]

Moonshine: An Online Randomness Distiller for Zero-Involvement Authentication

Authors: Jack West, Kyuin Lee, Suman Banerjee, Younghyun Kim, George K. Thiruvathukal, Neil Klingensmith

Abstract: Context-based authentication is a method for transparently validating another device's legitimacy to join a network based on location. Devices can pair with one another by continuously harvesting environmental noise to generate a random key with no user involvement. However, there are gaps in our understanding of the theoretical limitations of environmental noise harvesting, making it difficult fo… ▽ More Context-based authentication is a method for transparently validating another device's legitimacy to join a network based on location. Devices can pair with one another by continuously harvesting environmental noise to generate a random key with no user involvement. However, there are gaps in our understanding of the theoretical limitations of environmental noise harvesting, making it difficult for researchers to build efficient algorithms for sampling environmental noise and distilling keys from that noise. This work explores the information-theoretic capacity of context-based authentication mechanisms to generate random bit strings from environmental noise sources with known properties. Using only mild assumptions about the source process's characteristics, we demonstrate that commonly-used bit extraction algorithms extract only about 10% of the available randomness from a source noise process. We present an efficient algorithm to improve the quality of keys generated by context-based methods and evaluate it on real key extraction hardware. Moonshine is a randomness distiller which is more efficient at extracting bits from an environmental entropy source than existing methods. Our techniques nearly double the quality of keys as measured by the NIST test suite, producing keys that can be used in real-world authentication scenarios. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Comments: 16 pages, 5 figures, IPSN 2021

arXiv:2101.02286 [pdf, other]

Scalable Parallel Linear Solver for Compact Banded Systems on Heterogeneous Architectures

Authors: Hang Song, Kristen V. Matsuno, Jacob R. West, Akshay Subramaniam, Aditya S. Ghate, Sanjiva K. Lele

Abstract: A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional al… ▽ More A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transpose or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance optimization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to a wide range of partial differential equation problems, such as but not limited to the numerical simulations of compressible turbulent flows, aeroacoustics, elastic-plastic wave propagation, and electromagnetics. It alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units. △ Less

Submitted 3 February, 2021; v1 submitted 29 December, 2020; originally announced January 2021.

arXiv:2012.11055 [pdf, other]

Social Media COVID-19 Misinformation Interventions Viewed Positively, But Have Limited Impact

Authors: Christine Geeng, Tiona Francisco, Jevin West, Franziska Roesner

Abstract: Amidst COVID-19 misinformation spreading, social media platforms like Facebook and Twitter rolled out design interventions, including banners linking to authoritative resources and more specific "false information" labels. In late March 2020, shortly after these interventions began to appear, we conducted an exploratory mixed-methods survey (N = 311) to learn: what are social media users' attitude… ▽ More Amidst COVID-19 misinformation spreading, social media platforms like Facebook and Twitter rolled out design interventions, including banners linking to authoritative resources and more specific "false information" labels. In late March 2020, shortly after these interventions began to appear, we conducted an exploratory mixed-methods survey (N = 311) to learn: what are social media users' attitudes towards these interventions, and to what extent do they self-report effectiveness? We found that most participants indicated a positive attitude towards interventions, particularly post-specific labels for misinformation. Still, the majority of participants discovered or corrected misinformation through other means, most commonly web searches, suggesting room for platforms to do more to stem the spread of COVID-19 misinformation. △ Less

Submitted 20 December, 2020; originally announced December 2020.

arXiv:2005.12668 [pdf, other]

SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search

Authors: Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel S. Weld, Marti A. Hearst, Jevin West

Abstract: The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabili… ▽ More The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over $15K$ users with over $42K$ page views and $13\%$ returns. △ Less

Submitted 20 September, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

Comments: Accepted to EMNLP 2020

arXiv:2004.00092 [pdf, other]

VoltKey: Using Power Line Noise for Zero-Involvement Pairing and Authentication (Demo Abstract)

Authors: Jack West, Tien VoNguyen, Isaac Ahlgren, Iryna Motyashok, George K. Thiruvathukal, Neil Klingensmith

Abstract: We present VoltKey, a method that transparently generates secret keys for colocated devices, leveraging spatiotemporally unique noise contexts observed in commercial power line infrastructure. VoltKey extracts randomness from power line noise and securely converts it into an authentication token. Nearby devices which observe the same noise patterns on the powerline generate identical keys. The uni… ▽ More We present VoltKey, a method that transparently generates secret keys for colocated devices, leveraging spatiotemporally unique noise contexts observed in commercial power line infrastructure. VoltKey extracts randomness from power line noise and securely converts it into an authentication token. Nearby devices which observe the same noise patterns on the powerline generate identical keys. The unique noise pattern observed only by trusted devices connected to a local power line prevents malicious devices without physical access from obtaining unauthorized access to the network. VoltKey is implemented inside of a standard USB power supply as a platform-agnostic bolt-on addition to any IoT or mobile device or any wireless access point that is connected to the power outlet. △ Less

Submitted 31 March, 2020; originally announced April 2020.

Comments: Tools Demo: Accepted at Information Processing in Sensor Networks 2020

arXiv:2003.12452 [pdf, other]

FLIC: A Distributed Fog Cache for City-Scale Applications

Authors: Jack West, Neil Kingensmith, George K. Thiruvathukal

Abstract: We present FLIC, a distributed software data caching framework for fogs that reduces network traffic and latency. FLICis targeted toward city-scale deployments of cooperative IoT devices in which each node gathers and shares data with surrounding devices. As machine learning and other data processing techniques that require large volumes of training data are ported to low-cost and low-power IoT sy… ▽ More We present FLIC, a distributed software data caching framework for fogs that reduces network traffic and latency. FLICis targeted toward city-scale deployments of cooperative IoT devices in which each node gathers and shares data with surrounding devices. As machine learning and other data processing techniques that require large volumes of training data are ported to low-cost and low-power IoT systems, we expect that data analysis will be moved away from the cloud. Separation from the cloud will reduce reliance on power-hungry centralized cloud-based infrastructure. However, city-scale deployments of cooperative IoT devices often connect to the Internet with cellular service, in which service charges are proportional to network usage. IoT system architects must be clever in order to keep costs down in these scenarios. To reduce the network bandwidth required to operate city-scale deployments of cooperative IoT systems, FLIC implements a distributed cache on the IoT nodes in the fog. FLIC allows the IoT network to share its data without repetitively interacting with a simple cloud storage service reducing calls out to a backing store. Our results displayed a less than 2% miss rate on reads. Thus, allowing for only 5% of requests needing the backing store. We were also able to achieve more than 50% reduction in bytes transmitted per second. △ Less

Submitted 24 March, 2020; originally announced March 2020.

Comments: Accepted at 2020 IEEE International Conference on Fog Computing

arXiv:2001.00907 [pdf, other]

Selfish Algorithm and Emergence of Collective Intelligence

Authors: Korosh Mahmoodi, Bruce J. West, Cleotilde Gonzalez

Abstract: We propose a model for demonstrating spontaneous emergence of collective intelligent behavior from selfish individual agents. Agents' behavior is modeled using our proposed selfish algorithm ($SA$) with three learning mechanisms: reinforced learning ($SAL$), trust ($SAT$) and connection ($SAC$). Each of these mechanisms provides a distinctly different way an agent can increase the individual benef… ▽ More We propose a model for demonstrating spontaneous emergence of collective intelligent behavior from selfish individual agents. Agents' behavior is modeled using our proposed selfish algorithm ($SA$) with three learning mechanisms: reinforced learning ($SAL$), trust ($SAT$) and connection ($SAC$). Each of these mechanisms provides a distinctly different way an agent can increase the individual benefit accrued through playing the prisoner's dilemma game ($PDG$) with other agents. The $SA$ provides a generalization of the self-organized temporal criticality ($SOTC$) model and shows that self-interested individuals can simultaneously produce maximum social benefit from their decisions. The mechanisms in the $SA$ are self-tuned by the internal dynamics and without having a pre-established network structure. Our results demonstrate emergence of mutual cooperation, emergence of dynamic networks, and adaptation and resilience of social systems after perturbations. The implications and applications of the $SA$ are discussed. △ Less

Submitted 3 January, 2020; originally announced January 2020.

arXiv:1908.07465 [pdf, other]

Delineating Knowledge Domains in the Scientific Literature Using Visual Information

Authors: Sean Yang, Po-shen Lee, Jevin D. West, Bill Howe

Abstract: Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender system… ▽ More Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender systems, and studies of scientific information exchange. We encode sets of images into a visual signature, then use distances between these signatures to understand how patterns of visual communication compare with patterns of jargon and citation structures. We find that figures can be as effective for differentiating communities of practice as text or citation patterns. We then consider where these metrics disagree to understand how different disciplines use visualization to express ideas. Finally, we further consider how specific figure types propagate through the literature, suggesting a new mechanism for understanding the flow of ideas apart from conventional channels of text and citations. Our ultimate aim is to better leverage these information-dense objects to improve scientific communication across disciplinary boundaries. △ Less

Submitted 12 August, 2019; originally announced August 2019.

arXiv:1907.13414 [pdf, other]

The demography of the peripatetic researcher: Evidence on highly mobile scholars from the Web of Science

Authors: Samin Aref, Emilio Zagheni, Jevin West

Abstract: The policy debate around researchers' geographic mobility has been moving away from a theorized zero-sum game in which countries can be winners (brain gain) or losers (brain drain), and toward the concept of brain circulation, which implies that researchers move in and out of countries and everyone benefits. Quantifying trends in researchers' movements is key to understanding the drivers of the mo… ▽ More The policy debate around researchers' geographic mobility has been moving away from a theorized zero-sum game in which countries can be winners (brain gain) or losers (brain drain), and toward the concept of brain circulation, which implies that researchers move in and out of countries and everyone benefits. Quantifying trends in researchers' movements is key to understanding the drivers of the mobility of talent, as well as the implications of these patterns for the global system of science, and for the competitive advantages of individual countries. Existing studies have investigated bilateral flows of researchers. However, in order to understand migration systems, determining the extent to which researchers have worked in more than two countries is essential. This study focuses on the subgroup of highly mobile researchers whom we refer to as peripatetic researchers or super-movers. More specifically, our aim is to track the international movements of researchers who have published in more than two countries through changes in the main affiliation addresses of researchers in over 62 million publications indexed in the Web of Science database over the 1956-2016 period. Using this approach, we have established a longitudinal dataset on the international movements of highly mobile researchers across all subject categories, and in all disciplines of scholarship. This article contributes to the literature by offering for the first time a snapshot of the key features of highly mobile researchers, including their patterns of migration and return migration by academic age, the relative frequency of their disciplines, and the relative frequency of their countries of origin and destination. Among other findings, the results point to the emergence of a global system that includes the USA and China as two large hubs, and England and Germany as two smaller hubs for highly mobile researchers. △ Less

Submitted 31 July, 2019; originally announced July 2019.

Comments: Accepted author copy, 16 pages, 10 figures. Lecture Notes in Computer Science, (2019), Proceedings of the 11th International Conference on Social Informatics, Doha, Qatar

arXiv:1903.12328 [pdf, other]

Improved Reinforcement Learning with Curriculum

Authors: Joseph West, Frederic Maire, Cameron Browne, Simon Denman

Abstract: Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understo… ▽ More Humans tend to learn complex abstract concepts faster if examples are presented in a structured manner. For instance, when learning how to play a board game, usually one of the first concepts learned is how the game ends, i.e. the actions that lead to a terminal state (win, lose or draw). The advantage of learning end-games first is that once the actions which lead to a terminal state are understood, it becomes possible to incrementally learn the consequences of actions that are further away from a terminal state - we call this an end-game-first curriculum. Currently the state-of-the-art machine learning player for general board games, AlphaZero by Google DeepMind, does not employ a structured training curriculum; instead learning from the entire game at all times. By employing an end-game-first training curriculum to train an AlphaZero inspired player, we empirically show that the rate of learning of an artificial player can be improved during the early stages of training when compared to a player not using a training curriculum. △ Less

Submitted 10 June, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

Comments: Draft prior to submission to IEEE Trans on Games. Changed paper slightly

arXiv:1812.03249 [pdf, ps, other]

Measuring scientific buzz

Authors: Kishore Vasan, Jevin West

Abstract: Keywords are important for information retrieval. They are used to classify and sort papers. However, these terms can also be used to study trends within and across fields. We want to explore the lifecycle of new keywords. How often do new terms come into existence and how long till they fade out? In this paper, we present our preliminary analysis where we measure the burstiness of keywords within… ▽ More Keywords are important for information retrieval. They are used to classify and sort papers. However, these terms can also be used to study trends within and across fields. We want to explore the lifecycle of new keywords. How often do new terms come into existence and how long till they fade out? In this paper, we present our preliminary analysis where we measure the burstiness of keywords within the field of AI. We examine 150k keywords in approximately 100k journal and conference papers. We find that nearly 80\% of the keywords die off before year one for both journals and conferences but that terms last longer in journals versus conferences. We also observe time periods of thematic bursts in AI -- one where the terms are more neuroscience inspired and one more oriented to computational optimization. This work shows promise of using author keywords to better understand dynamics of buzz within science. △ Less

Submitted 7 December, 2018; originally announced December 2018.

Comments: iConference 2019 poster

arXiv:1809.09328 [pdf, other]

Why scatter plots suggest causality, and what we can do about it

Authors: Carl T. Bergstrom, Jevin D. West

Abstract: Scatter plots carry an implicit if subtle message about causality. Whether we look at functions of one variable in pure mathematics, plots of experimental measurements as a function of the experimental conditions, or scatter plots of predictor and response variables, the value plotted on the vertical axis is by convention assumed to be determined or influenced by the value on the horizontal axis.… ▽ More Scatter plots carry an implicit if subtle message about causality. Whether we look at functions of one variable in pure mathematics, plots of experimental measurements as a function of the experimental conditions, or scatter plots of predictor and response variables, the value plotted on the vertical axis is by convention assumed to be determined or influenced by the value on the horizontal axis. This is a problem for the public understanding of scientific results and perhaps also for professional scientists' interpretations of scatter plots. To avoid suggesting a causal relationship between the x and y values in a scatter plot, we propose a new type of data visualization, the diamond plot. Diamond plots are essentially 45 degree rotations of ordinary scatter plots; by visually jarring the viewer they clearly indicate that she should not draw the usual distinction between independent/predictor variable and dependent/response variable. Instead, she should see the relationship as purely correlative. △ Less

Submitted 25 September, 2018; originally announced September 2018.

arXiv:1809.04093 [pdf, other]

Is together better? Examining scientific collaborations across multiple authors, institutions, and departments

Authors: Lovenoor Aulck, Kishore Vasan, Jevin West

Abstract: Collaborations are an integral part of scientific research and publishing. In the past, access to large-scale corpora has limited the ways in which questions about collaborations could be investigated. However, with improvements in data/metadata quality and access, it is possible to explore the idea of research collaboration in ways beyond the traditional definition of multiple authorship. In this… ▽ More Collaborations are an integral part of scientific research and publishing. In the past, access to large-scale corpora has limited the ways in which questions about collaborations could be investigated. However, with improvements in data/metadata quality and access, it is possible to explore the idea of research collaboration in ways beyond the traditional definition of multiple authorship. In this paper, we examine scientific works through three different lenses of collaboration: across multiple authors, multiple institutions, and multiple departments. We believe this to be a first look at multiple departmental collaborations as we employ extensive data curation to disambiguate authors' departmental affiliations for nearly 70,000 scientific papers. We then compare citation metrics across the different definitions of collaboration and find that papers defined as being collaborative were more frequently cited than their non-collaborative counterparts, regardless of the definition of collaboration used. We also share preliminary results from examining the relationship between co-citation and co-authorship by analyzing the extent to which similar fields (as determined by co-citation) are collaborating on works (as determined by co-authorship). These preliminary results reveal trends of compartmentalization with respect to intra-institutional collaboration and show promise in being expanded. △ Less

Submitted 11 September, 2018; originally announced September 2018.

Journal ref: KDD: BigScholar 2018

arXiv:1712.08980 [pdf]

The Internet of Battle Things

Authors: Alexander Kott, Ananthram Swami, Bruce J West

Abstract: The battlefield of the future will be densely populated by a variety of entities ("things") -- some intelligent and some only marginally so -- performing a broad range of tasks: sensing, communicating, acting, and collaborating with each other and human warfighters. We call this the Internet of Battle Things, IoBT. In some ways, IoBT is already becoming a reality, but 20-30 years from now it is li… ▽ More The battlefield of the future will be densely populated by a variety of entities ("things") -- some intelligent and some only marginally so -- performing a broad range of tasks: sensing, communicating, acting, and collaborating with each other and human warfighters. We call this the Internet of Battle Things, IoBT. In some ways, IoBT is already becoming a reality, but 20-30 years from now it is likely to become a dominant presence in warfare. To become a reality, however, this bold vision will have to overcome a number of major challenges. As one example of such a challenge, the communications among things will have to be flexible and adaptive to rapidly changing situations and military missions. In this paper, we explore this and several other major challenges of IoBT, and outline key research directions and approaches towards solving these challenges. △ Less

Submitted 24 December, 2017; originally announced December 2017.

Comments: This is a version of the article that appears in IEEE Computer as: Kott, Alexander, Ananthram Swami, and Bruce J. West. "The Internet of Battle Things." Computer 49.12 (2016): 70-75

Journal ref: Computer 49.12 (2016): 70-75

arXiv:1708.09344 [pdf, other]

Stem-ming the Tide: Predicting STEM attrition using student transcript data

Authors: Lovenoor Aulck, Rohan Aras, Lysia Li, Coulter L'Heureux, Peter Lu, Jevin West

Abstract: Science, technology, engineering, and math (STEM) fields play growing roles in national and international economies by driving innovation and generating high salary jobs. Yet, the US is lagging behind other highly industrialized nations in terms of STEM education and training. Furthermore, many economic forecasts predict a rising shortage of domestic STEM-trained professions in the US for years to… ▽ More Science, technology, engineering, and math (STEM) fields play growing roles in national and international economies by driving innovation and generating high salary jobs. Yet, the US is lagging behind other highly industrialized nations in terms of STEM education and training. Furthermore, many economic forecasts predict a rising shortage of domestic STEM-trained professions in the US for years to come. One potential solution to this deficit is to decrease the rates at which students leave STEM-related fields in higher education, as currently over half of all students intending to graduate with a STEM degree eventually attrite. However, little quantitative research at scale has looked at causes of STEM attrition, let alone the use of machine learning to examine how well this phenomenon can be predicted. In this paper, we detail our efforts to model and predict dropout from STEM fields using one of the largest known datasets used for research on students at a traditional campus setting. Our results suggest that attrition from STEM fields can be accurately predicted with data that is routinely collected at universities using only information on students' first academic year. We also propose a method to model student STEM intentions for each academic term to better understand the timing of STEM attrition events. We believe these results show great promise in using machine learning to improve STEM retention in traditional and non-traditional campus settings. △ Less

Submitted 28 August, 2017; originally announced August 2017.

arXiv:1611.07135 [pdf, other]

doi 10.3389/frma.2017.00008

Leveraging Citation Networks to Visualize Scholarly Influence Over Time

Authors: Jason Portenoy, Jessica Hullman, Jevin D. West

Abstract: Assessing the influence of a scholar's work is an important task for funding organizations, academic departments, and researchers. Common methods, such as measures of citation counts, can ignore much of the nuance and multidimensionality of scholarly influence. We present an approach for generating dynamic visualizations of scholars' careers. This approach uses an animated node-link diagram showin… ▽ More Assessing the influence of a scholar's work is an important task for funding organizations, academic departments, and researchers. Common methods, such as measures of citation counts, can ignore much of the nuance and multidimensionality of scholarly influence. We present an approach for generating dynamic visualizations of scholars' careers. This approach uses an animated node-link diagram showing the citation network accumulated around the researcher over the course of the career in concert with key indicators, highlighting influence both within and across fields. We developed our design in collaboration with one funding organization---the Pew Biomedical Scholars program---but the methods are generalizable to visualizations of scholarly influence. We applied the design method to the Microsoft Academic Graph, which includes more than 120 million publications. We validate our abstractions throughout the process through collaboration with the Pew Biomedical Scholars program officers and summative evaluations with their scholars. △ Less

Submitted 5 December, 2016; v1 submitted 21 November, 2016; originally announced November 2016.

ACM Class: H.5.2

arXiv:1607.00376 [pdf]

doi 10.1177/2378023117738903

Men Set Their Own Cites High: Gender and Self-citation across Fields and over Time

Authors: Molly M. King, Carl T. Bergstrom, Shelley J. Correll, Jennifer Jacquet, Jevin D. West

Abstract: How common is self-citation in scholarly publication, and does the practice vary by gender? Using novel methods and a data set of 1.5 million research papers in the scholarly database JSTOR published between 1779 and 2011, the authors find that nearly 10 percent of references are self-citations by a paper's authors. The findings also show that between 1779 and 2011, men cited their own papers 56 p… ▽ More How common is self-citation in scholarly publication, and does the practice vary by gender? Using novel methods and a data set of 1.5 million research papers in the scholarly database JSTOR published between 1779 and 2011, the authors find that nearly 10 percent of references are self-citations by a paper's authors. The findings also show that between 1779 and 2011, men cited their own papers 56 percent more than did women. In the last two decades of data, men self-cited 70 percent more than women. Women are also more than 10 percentage points more likely than men to not cite their own previous work at all. While these patterns could result from differences in the number of papers that men and women authors have published rather than gender-specific patterns of self-citation behavior, this gender gap in self-citation rates has remained stable over the last 50 years, despite increased representation of women in academia. The authors break down self-citation patterns by academic field and number of authors and comment on potential mechanisms behind these observations. These findings have important implications for scholarly visibility and cumulative advantage in academic careers. △ Less

Submitted 12 December, 2017; v1 submitted 30 June, 2016; originally announced July 2016.

Comments: final published article

Journal ref: Socius 3: 1-22 (2017)

arXiv:1606.08534 [pdf, other]

Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF)

Authors: Ian Wesley-Smith, Carl T. Bergstrom, Jevin D. West

Abstract: Microsoft Research hosted the 2016 WSDM Cup Challenge based on the Microsoft Academic Graph. The goal was to provide static rankings for the articles that make up the graph, with the rankings to be evaluated against those of human judges. While the Microsoft Academic Graph provided metadata about many aspects of each scholarly document, we focused more narrowly on citation data and used this conte… ▽ More Microsoft Research hosted the 2016 WSDM Cup Challenge based on the Microsoft Academic Graph. The goal was to provide static rankings for the articles that make up the graph, with the rankings to be evaluated against those of human judges. While the Microsoft Academic Graph provided metadata about many aspects of each scholarly document, we focused more narrowly on citation data and used this contest as an opportunity to test the Article Level Eigenfactor (ALEF), a novel citation-based ranking algorithm, and evaluate its performance against competing algorithms that drew upon multiple facets of the data from a large, real world dataset (122M papers and 757M citations). Our final submission to this contest was scored at 0.676, earning second place. △ Less

Submitted 27 June, 2016; originally announced June 2016.

arXiv:1606.06364 [pdf, other]

Predicting Student Dropout in Higher Education

Authors: Lovenoor Aulck, Nishant Velagapudi, Joshua Blumenstock, Jevin West

Abstract: Each year, roughly 30% of first-year students at US baccalaureate institutions do not return for their second year and over $9 billion is spent educating these students. Yet, little quantitative research has analyzed the causes and possible remedies for student attrition. Here, we describe initial efforts to model student dropout using the largest known dataset on higher education attrition, which… ▽ More Each year, roughly 30% of first-year students at US baccalaureate institutions do not return for their second year and over $9 billion is spent educating these students. Yet, little quantitative research has analyzed the causes and possible remedies for student attrition. Here, we describe initial efforts to model student dropout using the largest known dataset on higher education attrition, which tracks over 32,500 students' demographics and transcript records at one of the nation's largest public universities. Our results highlight several early indicators of student attrition and show that dropout can be accurately predicted even when predictions are based on a single term of academic transcript data. These results highlight the potential for machine learning to have an impact on student retention and success while pointing to several promising directions for future work. △ Less

Submitted 7 March, 2017; v1 submitted 20 June, 2016; originally announced June 2016.

Comments: Presented at 2016 ICML Workshop on #Data4Good: Machine Learning in Social Good Applications, New York, NY

arXiv:1605.04951 [pdf, other]

Viziometrics: Analyzing Visual Information in the Scientific Literature

Authors: Po-shen Lee, Jevin D. West, Bill Howe

Abstract: Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this paper, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types… ▽ More Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this paper, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types and study the resulting patterns of visual information as they relate to impact. We find that the distribution of figures and figure types in the literature has remained relatively constant over time, but can vary widely across field and topic. Remarkably, we find a significant correlation between scientific impact and the use of visual information, where higher impact papers tend to include more diagrams, and to a lesser extent more plots and photographs. To explore these results and other ways of extracting this visual information, we have built a visual browser to illustrate the concept and explore design alternatives for supporting viziometric analysis and organizing visual information. We use these results to articulate a new research agenda -- viziometrics -- to study the organization and presentation of visual information in the scientific literature. △ Less

Submitted 27 May, 2016; v1 submitted 16 May, 2016; originally announced May 2016.

arXiv:1601.01228 [pdf]

Some Experimental Issues in Financial Fraud Detection: An Investigation

Authors: J. West, Maumita Bhattacharya

Abstract: Financial fraud detection is an important problem with a number of design aspects to consider. Issues such as algorithm selection and performance analysis will affect the perceived ability of proposed solutions, so for auditors and re-searchers to be able to sufficiently detect financial fraud it is necessary that these issues be thoroughly explored. In this paper we will revisit the key performan… ▽ More Financial fraud detection is an important problem with a number of design aspects to consider. Issues such as algorithm selection and performance analysis will affect the perceived ability of proposed solutions, so for auditors and re-searchers to be able to sufficiently detect financial fraud it is necessary that these issues be thoroughly explored. In this paper we will revisit the key performance metrics used for financial fraud detection with a focus on credit card fraud, critiquing the prevailing ideas and offering our own understandings. There are many different performance metrics that have been employed in prior financial fraud detection research. We will analyse several of the popular metrics and compare their effectiveness at measuring the ability of detection mechanisms. We further investigated the performance of a range of computational intelligence techniques when applied to this problem domain, and explored the efficacy of several binary classification methods. △ Less

Submitted 6 January, 2016; originally announced January 2016.

Comments: J. West and Maumita Bhattacharya. "Some Experimental Issues in Financial Fraud Detection: An Investigation", In the Proceedings of The 5th International Symposium on Cloud and Service Computing (SC2 2015), IEEE CS Press

MSC Class: 68Txx

arXiv:1510.07167 [pdf]

Mining Financial Statement Fraud: An Analysis of Some Experimental Issues

Authors: J. West, Maumita Bhattacharya

Abstract: Financial statement fraud detection is an important problem with a number of design aspects to consider. Issues such as (i) problem representation, (ii) feature selection, and (iii) choice of performance metrics all influence the perceived performance of detection algorithms. Efficient implementation of financial fraud detection methods relies on a clear understanding of these issues. In this pape… ▽ More Financial statement fraud detection is an important problem with a number of design aspects to consider. Issues such as (i) problem representation, (ii) feature selection, and (iii) choice of performance metrics all influence the perceived performance of detection algorithms. Efficient implementation of financial fraud detection methods relies on a clear understanding of these issues. In this paper we present an analysis of the three key experimental issues associated with financial statement fraud detection, critiquing the prevailing ideas and providing new understandings. △ Less

Submitted 24 October, 2015; originally announced October 2015.

Comments: Proceedings of The 10th IEEE Conference on Industrial Electronics and Applications (ICIEA 2015), IEEE Press

MSC Class: 68U99

arXiv:1510.07165 [pdf]

Intelligent Financial Fraud Detection Practices: An Investigation

Authors: J. West, Maumita Bhattacharya, R. Islam

Abstract: Financial fraud is an issue with far reaching consequences in the finance industry, government, corporate sectors, and for ordinary consumers. Increasing dependence on new technologies such as cloud and mobile computing in recent years has compounded the problem. Traditional methods of detection involve extensive use of auditing, where a trained individual manually observes reports or transactions… ▽ More Financial fraud is an issue with far reaching consequences in the finance industry, government, corporate sectors, and for ordinary consumers. Increasing dependence on new technologies such as cloud and mobile computing in recent years has compounded the problem. Traditional methods of detection involve extensive use of auditing, where a trained individual manually observes reports or transactions in an attempt to discover fraudulent behaviour. This method is not only time consuming, expensive and inaccurate, but in the age of big data it is also impractical. Not surprisingly, financial institutions have turned to automated processes using statistical and computational methods. This paper presents a comprehensive investigation on financial fraud detection practices using such data mining methods, with a particular focus on computational intelligence-based techniques. Classification of the practices based on key aspects such as detection algorithm used, fraud type investigated, and success rate have been covered. Issues and challenges associated with the current practices and potential future direction of research have also been identified. △ Less

Submitted 24 October, 2015; originally announced October 2015.

Comments: Proceedings of the 10th International Conference on Security and Privacy in Communication Networks (SecureComm 2014)

MSC Class: 68U01

arXiv:1506.04326 [pdf, other]

doi 10.1209/0295-5075/111/58003

The Value of Conflict in Stable Social Networks

Authors: Pensri Pramukkul, Adam Svenkeson, Bruce J. West, Paolo Grigolini

Abstract: A cooperative network model of sociological interest is examined to determine the sensitivity of the global dynamics to having a fraction of the members behaving uncooperatively, that is, being in conflict with the majority. We study a condition where in the absence of these uncooperative individuals, the contrarians, the control parameter exceeds a critical value and the network is frozen in a st… ▽ More A cooperative network model of sociological interest is examined to determine the sensitivity of the global dynamics to having a fraction of the members behaving uncooperatively, that is, being in conflict with the majority. We study a condition where in the absence of these uncooperative individuals, the contrarians, the control parameter exceeds a critical value and the network is frozen in a state of consensus. The network dynamics change with variations in the percentage of contrarians, resulting in a balance between the value of the control parameter and the percentage of those in conflict with the majority. We show that the transmission of information from a network $B$ to a network $A$, with a small fraction of lookout members in $A$ who adopt the behavior of $B$, becomes maximal when both networks are assigned the same critical percentage of contrarians. △ Less

Submitted 13 June, 2015; originally announced June 2015.

Comments: 5 pages, 3 figures, 1 supplement

arXiv:1305.4807 [pdf, other]

doi 10.1038/ncomms5630

Memory in network flows and its effects on spreading dynamics and community detection

Authors: Martin Rosvall, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, Renaud Lambiotte

Abstract: Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking, and spreading analysis although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and… ▽ More Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking, and spreading analysis although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and while we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking, and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function. △ Less

Submitted 12 August, 2014; v1 submitted 21 May, 2013; originally announced May 2013.

Comments: 23 pages and 16 figures

Journal ref: Nature Communications 5, 4630 (2014)

arXiv:1211.1759 [pdf, other]

doi 10.1371/journal.pone.0066212

The role of gender in scholarly authorship

Authors: Jevin D. West, Jennifer Jacquet, Molly M. King, Shelley J. Correll, Carl T. Bergstrom

Abstract: Gender disparities appear to be decreasing in academia according to a number of metrics, such as grant funding, hiring, acceptance at scholarly journals, and productivity, and it might be tempting to think that gender inequity will soon be a problem of the past. However, a large-scale analysis based on over eight million papers across the natural sciences, social sciences, and humanities re- revea… ▽ More Gender disparities appear to be decreasing in academia according to a number of metrics, such as grant funding, hiring, acceptance at scholarly journals, and productivity, and it might be tempting to think that gender inequity will soon be a problem of the past. However, a large-scale analysis based on over eight million papers across the natural sciences, social sciences, and humanities re- reveals a number of understated and persistent ways in which gender inequities remain. For instance, even where raw publication counts seem to be equal between genders, close inspection reveals that, in certain fields, men predominate in the prestigious first and last author positions. Moreover, women are significantly underrepresented as authors of single-authored papers. Academics should be aware of the subtle ways that gender disparities can appear in scholarly authorship. △ Less

Submitted 7 November, 2012; originally announced November 2012.

arXiv:1207.1748 [pdf, other]

Role of Committed Minorities in Times of Crisis

Authors: Malgorzata Turalska, Bruce J. West, Paolo Grigolini

Abstract: We use a Cooperative Decision Making (CDM) model to study the effect of committed minorities on group behavior in time of crisis. The CDM model has been shown to generate consensus through a phase-transition process that at criticality establishes long-range correlations among the individuals within a model society. In a condition of high consensus, the correlation function vanishes, thereby makin… ▽ More We use a Cooperative Decision Making (CDM) model to study the effect of committed minorities on group behavior in time of crisis. The CDM model has been shown to generate consensus through a phase-transition process that at criticality establishes long-range correlations among the individuals within a model society. In a condition of high consensus, the correlation function vanishes, thereby making the network recover the ordinary locality condition. However, this state is not permanent and times of crisis occur when there is an ambiguity concerning a given social issue. The correlation function within the cooperative system becomes similarly extended as it is observed at criticality. This combination of independence (free will) and long-range correlation makes it possible for very small but committed minorities to produce substantial changes in social consensus. △ Less

Submitted 6 July, 2012; originally announced July 2012.

arXiv:0911.1807 [pdf, other]

Big Macs and Eigenfactor Scores: Don't Let Correlation Coefficients Fool You

Authors: Jevin West, Theodore Bergstrom, Carl Bergstrom

Abstract: The Eigenfactor Metrics provide an alternative way of evaluating scholarly journals based on an iterative ranking procedure analogous to Google's PageRank algorithm. These metrics have recently been adopted by Thomson-Reuters and are listed alongside the Impact Factor in the Journal Citation Reports. But do these metrics differ sufficiently so as to be a useful addition to the bibliometric toolbo… ▽ More The Eigenfactor Metrics provide an alternative way of evaluating scholarly journals based on an iterative ranking procedure analogous to Google's PageRank algorithm. These metrics have recently been adopted by Thomson-Reuters and are listed alongside the Impact Factor in the Journal Citation Reports. But do these metrics differ sufficiently so as to be a useful addition to the bibliometric toolbox? Davis (2008) has argued otherwise, based on his finding of a 0.95 correlation coefficient between Eigenfactor score and Total Citations for a sample of journals in the field of medicine. This conclusion is mistaken; here we illustrate the basic statistical fallacy to which Davis succumbed. We provide a complete analysis of the 2006 Journal Citation Reports and demonstrate that there are statistically and economically significant differences between the information provided by the Eigenfactor Metrics and that provided by Impact Factor and Total Citations. △ Less

Submitted 29 April, 2010; v1 submitted 9 November, 2009; originally announced November 2009.

Comments: Version 2 This is a response to Phil Davis's 2008 paper (arXiv:0807.2678)

Showing 1–38 of 38 results for author: West, J