Search | arXiv e-print repository

arXiv:2405.19187 [pdf, ps, other]

Algorithmic Transparency and Participation through the Handoff Lens: Lessons Learned from the U.S. Census Bureau's Adoption of Differential Privacy

Authors: Amina A. Abdu, Lauren M. Chambers, Deirdre K. Mulligan, Abigail Z. Jacobs

Abstract: Emerging discussions on the responsible government use of algorithmic technologies propose transparency and public participation as key mechanisms for preserving accountability and trust. But in practice, the adoption and use of any technology shifts the social, organizational, and political context in which it is embedded. Therefore translating transparency and participation efforts into meaningf… ▽ More Emerging discussions on the responsible government use of algorithmic technologies propose transparency and public participation as key mechanisms for preserving accountability and trust. But in practice, the adoption and use of any technology shifts the social, organizational, and political context in which it is embedded. Therefore translating transparency and participation efforts into meaningful, effective accountability must take into account these shifts. We adopt two theoretical frames, Mulligan and Nissenbaum's handoff model and Star and Griesemer's boundary objects, to reveal such shifts during the U.S. Census Bureau's adoption of differential privacy (DP) in its updated disclosure avoidance system (DAS) for the 2020 census. This update preserved (and arguably strengthened) the confidentiality protections that the Bureau is mandated to uphold, and the Bureau engaged in a range of activities to facilitate public understanding of and participation in the system design process. Using publicly available documents concerning the Census' implementation of DP, this case study seeks to expand our understanding of how technical shifts implicate values, how such shifts can afford (or fail to afford) greater transparency and participation in system design, and the importance of localized expertise throughout. We present three lessons from this case study toward grounding understandings of algorithmic transparency and participation: (1) efforts towards transparency and participation in algorithmic governance must center values and policy decisions, not just technical design decisions; (2) the handoff model is a useful tool for revealing how such values may be cloaked beneath technical decisions; and (3) boundary objects alone cannot bridge distant communities without trusted experts traveling alongside to broker their adoption. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 21 pages, FAccT '24

arXiv:2401.10877 [pdf, other]

The Cadaver in the Machine: The Social Practices of Measurement and Validation in Motion Capture Technology

Authors: Emma Harvey, Hauke Sandhaus, Abigail Z. Jacobs, Emanuel Moss, Mona Sloane

Abstract: Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, a… ▽ More Motion capture systems, used across various domains, make body representations concrete through technical processes. We argue that the measurement of bodies and the validation of measurements for motion capture systems can be understood as social practices. By analyzing the findings of a systematic literature review (N=278) through the lens of social practice theory, we show how these practices, and their varying attention to errors, become ingrained in motion capture design and innovation over time. Moreover, we show how contemporary motion capture systems perpetuate assumptions about human bodies and their movements. We suggest that social practices of measurement and validation are ubiquitous in the development of data- and sensor-driven systems more broadly, and provide this work as a basis for investigating hidden design assumptions and their potential negative consequences in human-computer interaction. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 34 pages, 9 figures. To appear in the 2024 ACM CHI Conference on Human Factors in Computing Systems (CHI '24)

arXiv:2311.06477 [pdf, other]

Report of the 1st Workshop on Generative AI and Law

Authors: A. Feder Cooper, Katherine Lee, James Grimmelmann, Daphne Ippolito, Christopher Callison-Burch, Christopher A. Choquette-Choo, Niloofar Mireshghallah, Miles Brundage, David Mimno, Madiha Zahrah Choksi, Jack M. Balkin, Nicholas Carlini, Christopher De Sa, Jonathan Frankle, Deep Ganguli, Bryant Gipson, Andres Guadamuz, Swee Leng Harris, Abigail Z. Jacobs, Elizabeth Joh, Gautam Kamath, Mark Lemley, Cass Matthews, Christine McLeavey, Corynne McSherry , et al. (10 additional authors not shown)

Abstract: This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report… ▽ More This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report. △ Less

Submitted 2 December, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

arXiv:2309.06607 [pdf, other]

doi 10.1145/3593013.3594083

An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature

Authors: Amina A. Abdu, Irene V. Pasquetto, Abigail Z. Jacobs

Abstract: Recent work in algorithmic fairness has highlighted the challenge of defining racial categories for the purposes of anti-discrimination. These challenges are not new but have previously fallen to the state, which enacts race through government statistics, policies, and evidentiary standards in anti-discrimination law. Drawing on the history of state race-making, we examine how longstanding questio… ▽ More Recent work in algorithmic fairness has highlighted the challenge of defining racial categories for the purposes of anti-discrimination. These challenges are not new but have previously fallen to the state, which enacts race through government statistics, policies, and evidentiary standards in anti-discrimination law. Drawing on the history of state race-making, we examine how longstanding questions about the nature of race and discrimination appear within the algorithmic fairness literature. Through a content analysis of 60 papers published at FAccT between 2018 and 2020, we analyze how race is conceptualized and formalized in algorithmic fairness frameworks. We note that differing notions of race are adopted inconsistently, at times even within a single analysis. We also explore the institutional influences and values associated with these choices. While we find that categories used in algorithmic fairness work often echo legal frameworks, we demonstrate that values from academic computer science play an equally important role in the construction of racial categories. Finally, we examine the reasoning behind different operationalizations of race, finding that few papers explicitly describe their choices and even fewer justify them. We argue that the construction of racial categories is a value-laden process with significant social and political consequences for the project of algorithmic fairness. The widespread lack of justification around the operationalization of race reflects institutional norms that allow these political decisions to remain obscured within the backstage of knowledge production. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 13 pages, 2 figures, FAccT '23

Journal ref: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (pp. 1324-1333)

arXiv:2305.05608 [pdf, other]

doi 10.1145/3539618.3591933

The Role of Relevance in Fair Ranking

Authors: Aparna Balagopalan, Abigail Z. Jacobs, Asia Biega

Abstract: Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because t… ▽ More Online platforms mediate access to opportunity: relevance-based rankings create and constrain options by allocating exposure to job openings and job candidates in hiring platforms, or sellers in a marketplace. In order to do so responsibly, these socially consequential systems employ various fairness measures and interventions, many of which seek to allocate exposure based on worthiness. Because these constructs are typically not directly observable, platforms must instead resort to using proxy scores such as relevance and infer them from behavioral signals such as searcher clicks. Yet, it remains an open question whether relevance fulfills its role as such a worthiness score in high-stakes fair rankings. In this paper, we combine perspectives and tools from the social sciences, information retrieval, and fairness in machine learning to derive a set of desired criteria that relevance scores should satisfy in order to meaningfully guide fairness interventions. We then empirically show that not all of these criteria are met in a case study of relevance inferred from biased user click data. We assess the impact of these violations on the estimated system fairness and analyze whether existing fairness interventions may mitigate the identified issues. Our analyses and results surface the pressing need for new approaches to relevance collection and generation that are suitable for use in fair ranking. △ Less

Submitted 6 June, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Published in SIGIR 2023

arXiv:2109.05658 [pdf, other]

Measurement as governance in and for responsible AI

Authors: Abigail Z. Jacobs

Abstract: Measurement of social phenomena is everywhere, unavoidably, in sociotechnical systems. This is not (only) an academic point: Fairness-related harms emerge when there is a mismatch in the measurement process between the thing we purport to be measuring and the thing we actually measure. However, the measurement process -- where social, cultural, and political values are implicitly encoded in sociot… ▽ More Measurement of social phenomena is everywhere, unavoidably, in sociotechnical systems. This is not (only) an academic point: Fairness-related harms emerge when there is a mismatch in the measurement process between the thing we purport to be measuring and the thing we actually measure. However, the measurement process -- where social, cultural, and political values are implicitly encoded in sociotechnical systems -- is almost always obscured. Furthermore, this obscured process is where important governance decisions are encoded: governance about which systems are fair, which individuals belong in which categories, and so on. We can then use the language of measurement, and the tools of construct validity and reliability, to uncover hidden governance decisions. In particular, we highlight two types of construct validity, content validity and consequential validity, that are useful to elicit and characterize the feedback loops between the measurement, social construction, and enforcement of social categories. We then explore the constructs of fairness, robustness, and responsibility in the context of governance in and for responsible AI. Together, these perspectives help us unpack how measurement acts as a hidden governance process in sociotechnical systems. Understanding measurement as governance supports a richer understanding of the governance processes already happening in AI -- responsible or otherwise -- revealing paths to more effective interventions. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: 5 pages, 1 figure; KDD Workshop on Responsible AI 2021

arXiv:2004.12207 [pdf, other]

Internet-human infrastructures: Lessons from Havana's StreetNet

Authors: Abigail Z. Jacobs, Michaelanne Dye

Abstract: We propose a mixed-methods approach to understanding the human infrastructure underlying StreetNet (SNET), a distributed, community-run intranet that serves as the primary 'Internet' in Havana, Cuba. We bridge ethnographic studies and the study of social networks and organizations to understand the way that power is embedded in the structure of Havana's SNET. By quantitatively and qualitatively un… ▽ More We propose a mixed-methods approach to understanding the human infrastructure underlying StreetNet (SNET), a distributed, community-run intranet that serves as the primary 'Internet' in Havana, Cuba. We bridge ethnographic studies and the study of social networks and organizations to understand the way that power is embedded in the structure of Havana's SNET. By quantitatively and qualitatively unpacking the human infrastructure of SNET, this work reveals how distributed infrastructure necessarily embeds the structural aspects of inequality distributed within that infrastructure. While traditional technical measurements of networks reflect the social, organizational, spatial, and technical constraints that shape the resulting network, ethnographies can help uncover the texture and role of these hidden supporting relationships. By merging these perspectives, this work contributes to our understanding of network roles in growing and maintaining distributed infrastructures, revealing new approaches to understanding larger, more complex Internet-human infrastructures---including the Internet and the WWW. △ Less

Submitted 25 April, 2020; originally announced April 2020.

Comments: 5 pages, 1 figure. WebConf Workshop on Innovative Ideas in Data Science (April 2020)

arXiv:1912.05511 [pdf, other]

doi 10.1145/3442188.3445901

Measurement and Fairness

Authors: Abigail Z. Jacobs, Hanna Wallach

Abstract: We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable propert… ▽ More We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable properties (and other unobservable theoretical constructs) thought to be related to them -- i.e., operationalized via a measurement model. This process, which necessarily involves making assumptions, introduces the potential for mismatches between the theoretical understanding of the construct purported to be measured and its operationalization. We argue that many of the harms discussed in the literature on fairness in computational systems are direct results of such mismatches. We show how some of these harms could have been anticipated and, in some cases, mitigated if viewed through the lens of measurement modeling. To do this, we contribute fairness-oriented conceptualizations of construct reliability and construct validity that unite traditions from political science, education, and psychology and provide a set of tools for making explicit and testing assumptions about constructs and their operationalizations. We then turn to fairness itself, an essentially contested construct that has different theoretical understandings in different contexts. We argue that this contestedness underlies recent debates about fairness definitions: although these debates appear to be about different operationalizations, they are, in fact, debates about different theoretical understandings of fairness. We show how measurement modeling can provide a framework for getting to the core of these debates. △ Less

Submitted 12 March, 2021; v1 submitted 11 December, 2019; originally announced December 2019.

Comments: 11 pages, 1 figure. To be published in the proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT '21)

arXiv:1811.04344

Discovering heterogeneous subpopulations for fine-grained analysis of opioid use and opioid use disorders

Authors: Jen J. Gong, Abigail Z. Jacobs, Toby E. Stuart, Mathijs de Vaan

Abstract: The opioid epidemic in the United States claims over 40,000 lives per year, and it is estimated that well over two million Americans have an opioid use disorder. Over-prescription and misuse of prescription opioids play an important role in the epidemic. Individuals who are prescribed opioids, and who are diagnosed with opioid use disorder, have diverse underlying health states. Policy interventio… ▽ More The opioid epidemic in the United States claims over 40,000 lives per year, and it is estimated that well over two million Americans have an opioid use disorder. Over-prescription and misuse of prescription opioids play an important role in the epidemic. Individuals who are prescribed opioids, and who are diagnosed with opioid use disorder, have diverse underlying health states. Policy interventions targeting prescription opioid use, opioid use disorder, and overdose often fail to account for this variation. To identify latent health states, or phenotypes, pertinent to opioid use and opioid use disorders, we use probabilistic topic modeling with medical diagnosis histories from a statewide population of individuals who were prescribed opioids. We demonstrate that our learned phenotypes are predictive of future opioid use-related outcomes. In addition, we show how the learned phenotypes can provide important context for variability in opioid prescriptions. Understanding the heterogeneity in individual health states and in prescription opioid use can help identify policy interventions to address this public health crisis. △ Less

Submitted 1 May, 2019; v1 submitted 10 November, 2018; originally announced November 2018.

Comments: Withdrawn pending data use agreement clarification

arXiv:1811.01452 [pdf, other]

Assembly in populations of social networks

Authors: Abigail Z. Jacobs

Abstract: In-depth studies of sociotechnical systems are largely limited to single instances. Network surveys are expensive, and platforms vary in important ways, from interface design, to social norms, to historical contingencies. With single examples, we can not in general know how much of observed network structure is explained by historical accidents, random noise, or meaningful social processes, nor ca… ▽ More In-depth studies of sociotechnical systems are largely limited to single instances. Network surveys are expensive, and platforms vary in important ways, from interface design, to social norms, to historical contingencies. With single examples, we can not in general know how much of observed network structure is explained by historical accidents, random noise, or meaningful social processes, nor can we claim that network structure predicts outcomes, such as organization success or ecosystem health. Here, I show how we can adopt a comparative approach for settings where we have, or can cleverly construct, multiple instances of a network to estimate the natural variability in social systems. The comparative approach makes previously untested theories testable. Drawing on examples from the social networks literature, I discuss emerging directions in the study of populations of sociotechnical systems using insights from organization theory and ecology. △ Less

Submitted 4 November, 2018; originally announced November 2018.

Comments: 4 pages, 1 figure. Position paper for CSCW Workshop on Navigating the Challenges of Multi-Site Research

ACM Class: J.4; H.5.3

arXiv:1505.04741 [pdf, other]

Untangling the roles of parasites in food webs with generative network models

Authors: Abigail Z. Jacobs, Jennifer A. Dunne, Cristopher Moore, Aaron Clauset

Abstract: Food webs represent the set of consumer-resource interactions among a set of species that co-occur in a habitat, but most food web studies have omitted parasites and their interactions. Recent studies have provided conflicting evidence on whether including parasites changes food web structure, with some suggesting that parasitic interactions are structurally distinct from those among free-living s… ▽ More Food webs represent the set of consumer-resource interactions among a set of species that co-occur in a habitat, but most food web studies have omitted parasites and their interactions. Recent studies have provided conflicting evidence on whether including parasites changes food web structure, with some suggesting that parasitic interactions are structurally distinct from those among free-living species while others claim the opposite. Here, we describe a principled method for understanding food web structure that combines an efficient optimization algorithm from statistical physics called parallel tempering with a probabilistic generalization of the empirically well-supported food web niche model. This generative model approach allows us to rigorously estimate the degree to which interactions that involve parasites are statistically distinguishable from interactions among free-living species, whether parasite niches behave similarly to free-living niches, and the degree to which existing hypotheses about food web structure are naturally recovered. We apply this method to the well-studied Flensburg Fjord food web and show that while predation on parasites, concomitant predation of parasites, and parasitic intraguild trophic interactions are largely indistinguishable from free-living predation interactions, parasite-host interactions are different. These results provide a powerful new tool for evaluating the impact of classes of species and interactions on food web structure to shed new light on the roles of parasites in food webs △ Less

Submitted 18 May, 2015; originally announced May 2015.

Comments: 17 pages, 7 figures

arXiv:1503.06772 [pdf, other]

doi 10.1145/2786451.2786477

Assembling thefacebook: Using heterogeneity to understand online social network assembly

Authors: Abigail Z. Jacobs, Samuel F. Way, Johan Ugander, Aaron Clauset

Abstract: Online social networks represent a popular and diverse class of social media systems. Despite this variety, each of these systems undergoes a general process of online social network assembly, which represents the complicated and heterogeneous changes that transform newly born systems into mature platforms. However, little is known about this process. For example, how much of a network's assembly… ▽ More Online social networks represent a popular and diverse class of social media systems. Despite this variety, each of these systems undergoes a general process of online social network assembly, which represents the complicated and heterogeneous changes that transform newly born systems into mature platforms. However, little is known about this process. For example, how much of a network's assembly is driven by simple growth? How does a network's structure change as it matures? How does network structure vary with adoption rates and user heterogeneity, and do these properties play different roles at different points in the assembly? We investigate these and other questions using a unique dataset of online connections among the roughly one million users at the first 100 colleges admitted to Facebook, captured just 20 months after its launch. We first show that different vintages and adoption rates across this population of networks reveal temporal dynamics of the assembly process, and that assembly is only loosely related to network growth. We then exploit natural experiments embedded in this dataset and complementary data obtained via Internet archaeology to show that different subnetworks matured at different rates toward similar end states. These results shed light on the processes and patterns of online social network assembly, and may facilitate more effective design for online social systems. △ Less

Submitted 31 May, 2015; v1 submitted 23 March, 2015; originally announced March 2015.

Comments: 13 pages, 11 figures, Proceedings of the 7th Annual ACM Web Science Conference (WebSci), 2015

arXiv:1411.4070 [pdf, other]

A unified view of generative models for networks: models, methods, opportunities, and challenges

Authors: Abigail Z. Jacobs, Aaron Clauset

Abstract: Research on probabilistic models of networks now spans a wide variety of fields, including physics, sociology, biology, statistics, and machine learning. These efforts have produced a diverse ecology of models and methods. Despite this diversity, many of these models share a common underlying structure: pairwise interactions (edges) are generated with probability conditional on latent vertex attri… ▽ More Research on probabilistic models of networks now spans a wide variety of fields, including physics, sociology, biology, statistics, and machine learning. These efforts have produced a diverse ecology of models and methods. Despite this diversity, many of these models share a common underlying structure: pairwise interactions (edges) are generated with probability conditional on latent vertex attributes. Differences between models generally stem from different philosophical choices about how to learn from data or different empirically-motivated goals. The highly interdisciplinary nature of work on these generative models, however, has inhibited the development of a unified view of their similarities and differences. For instance, novel theoretical models and optimization techniques developed in machine learning are largely unknown within the social and biological sciences, which have instead emphasized model interpretability. Here, we describe a unified view of generative models for networks that draws together many of these disparate threads and highlights the fundamental similarities and differences that span these fields. We then describe a number of opportunities and challenges for future work that are revealed by this view. △ Less

Submitted 14 November, 2014; originally announced November 2014.

Comments: 10 pages. To appear at the NIPS 2014 Workshop on Networks: From Graphs to Rich Data

arXiv:1404.0431 [pdf, other]

doi 10.1093/comnet/cnu026

Learning Latent Block Structure in Weighted Networks

Authors: Christopher Aicher, Abigail Z. Jacobs, Aaron Clauset

Abstract: Community detection is an important task in network analysis, in which we aim to learn a network partition that groups together vertices with similar community-level connectivity patterns. By finding such groups of vertices with similar structural roles, we extract a compact representation of the network's large-scale structure, which can facilitate its scientific interpretation and the prediction… ▽ More Community detection is an important task in network analysis, in which we aim to learn a network partition that groups together vertices with similar community-level connectivity patterns. By finding such groups of vertices with similar structural roles, we extract a compact representation of the network's large-scale structure, which can facilitate its scientific interpretation and the prediction of unknown or future interactions. Popular approaches, including the stochastic block model, assume edges are unweighted, which limits their utility by throwing away potentially useful information. We introduce the `weighted stochastic block model' (WSBM), which generalizes the stochastic block model to networks with edge weights drawn from any exponential family distribution. This model learns from both the presence and weight of edges, allowing it to discover structure that would otherwise be hidden when weights are discarded or thresholded. We describe a Bayesian variational algorithm for efficiently approximating this model's posterior distribution over latent block structures. We then evaluate the WSBM's performance on both edge-existence and edge-weight prediction tasks for a set of real-world weighted networks. In all cases, the WSBM performs as well or better than the best alternatives on these tasks. △ Less

Submitted 3 June, 2014; v1 submitted 1 April, 2014; originally announced April 2014.

Comments: 28 Pages

Journal ref: Journal of Complex Networks (2015) 3 (2): 221-248

arXiv:1403.2933 [pdf, other]

doi 10.1103/PhysRevE.90.012805

Efficiently inferring community structure in bipartite networks

Authors: Daniel B. Larremore, Aaron Clauset, Abigail Z. Jacobs

Abstract: Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing approaches to bipartite community detection have drawbacks, including implicit parameter choices, loss of information through one-mode projections, and… ▽ More Bipartite networks are a common type of network data in which there are two types of vertices, and only vertices of different types can be connected. While bipartite networks exhibit community structure like their unipartite counterparts, existing approaches to bipartite community detection have drawbacks, including implicit parameter choices, loss of information through one-mode projections, and lack of interpretability. Here we solve the community detection problem for bipartite networks by formulating a bipartite stochastic block model, which explicitly includes vertex type information and may be trivially extended to $k$-partite networks. This bipartite stochastic block model yields a projection-free and statistically principled method for community detection that makes clear assumptions and parameter choices and yields interpretable results. We demonstrate this model's ability to efficiently and accurately find community structure in synthetic bipartite networks with known structure and in real-world bipartite networks with unknown structure, and we characterize its performance in practical contexts. △ Less

Submitted 10 July, 2014; v1 submitted 12 March, 2014; originally announced March 2014.

Comments: 12 pages, 9 figures

Journal ref: Physical Review E 90(1): 012805 (2014)

arXiv:1305.5782 [pdf, ps, other]

Adapting the Stochastic Block Model to Edge-Weighted Networks

Authors: Christopher Aicher, Abigail Z. Jacobs, Aaron Clauset

Abstract: We generalize the stochastic block model to the important case in which edges are annotated with weights drawn from an exponential family distribution. This generalization introduces several technical difficulties for model estimation, which we solve using a Bayesian approach. We introduce a variational algorithm that efficiently approximates the model's posterior distribution for dense graphs. In… ▽ More We generalize the stochastic block model to the important case in which edges are annotated with weights drawn from an exponential family distribution. This generalization introduces several technical difficulties for model estimation, which we solve using a Bayesian approach. We introduce a variational algorithm that efficiently approximates the model's posterior distribution for dense graphs. In specific numerical experiments on edge-weighted networks, this weighted stochastic block model outperforms the common approach of first applying a single threshold to all weights and then applying the classic stochastic block model, which can obscure latent block structure in networks. This model will enable the recovery of latent structure in a broader range of network data than was previously possible. △ Less

Submitted 24 May, 2013; originally announced May 2013.

arXiv:1303.6372 [pdf, ps, other]

Detecting Friendship Within Dynamic Online Interaction Networks

Authors: Sears Merritt, Abigail Z. Jacobs, Winter Mason, Aaron Clauset

Abstract: In many complex social systems, the timing and frequency of interactions between individuals are observable but friendship ties are hidden. Recovering these hidden ties, particularly for casual users who are relatively less active, would enable a wide variety of friendship-aware applications in domains where labeled data are often unavailable, including online advertising and national security. He… ▽ More In many complex social systems, the timing and frequency of interactions between individuals are observable but friendship ties are hidden. Recovering these hidden ties, particularly for casual users who are relatively less active, would enable a wide variety of friendship-aware applications in domains where labeled data are often unavailable, including online advertising and national security. Here, we investigate the accuracy of multiple statistical features, based either purely on temporal interaction patterns or on the cooperative nature of the interactions, for automatically extracting latent social ties. Using self-reported friendship and non-friendship labels derived from an anonymous online survey, we learn highly accurate predictors for recovering hidden friendships within a massive online data set encompassing 18 billion interactions among 17 million individuals of the popular online game Halo: Reach. We find that the accuracy of many features improves as more data accumulates, and cooperative features are generally reliable. However, periodicities in interaction time series are sufficient to correctly classify 95% of ties, even for casual users. These results clarify the nature of friendship in online social environments and suggest new opportunities and new privacy concerns for friendship-aware applications that do not require the disclosure of private friendship information. △ Less

Submitted 25 March, 2013; originally announced March 2013.

Comments: To Appear at the 7th International AAAI Conference on Weblogs and Social Media (ICWSM '13), 11 pages, 1 table, 6 figures

Journal ref: Proc. of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM), 380 - 389 (2013)

arXiv:1103.0949 [pdf, other]

Adapting to Non-stationarity with Growing Expert Ensembles

Authors: Cosma Rohilla Shalizi, Abigail Z. Jacobs, Kristina Lisa Klinkner, Aaron Clauset

Abstract: When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''.… ▽ More When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''. However, existing methods assume that the set of experts whose forecasts are to be combined are all given at the start, which is not plausible when dealing with a genuinely historical or evolutionary system. We show how to modify the ``fixed shares'' algorithm for tracking the best expert to cope with a steadily growing set of experts, obtained by fitting new models to new data as it becomes available, and obtain regret bounds for the growing ensemble. △ Less

Submitted 28 June, 2011; v1 submitted 4 March, 2011; originally announced March 2011.

Comments: 9 pages, 1 figure; CMU Statistics Technical Report. v2: Added empirical example, revised discussion of related work

Showing 1–18 of 18 results for author: Jacobs, A Z