Search | arXiv e-print repository

Temporal fingerprints: Identity matching across fully encrypted domain

Authors: Shahar Somin, Keeley Erhardt, Alex 'Sandy' Pentland

Abstract: Technological advancements have significantly transformed communication patterns, introducing a diverse array of online platforms, thereby prompting individuals to use multiple profiles for different domains and objectives. Enhancing the understanding of cross domain identity matching capabilities is essential, not only for practical applications such as commercial strategies and cybersecurity mea… ▽ More Technological advancements have significantly transformed communication patterns, introducing a diverse array of online platforms, thereby prompting individuals to use multiple profiles for different domains and objectives. Enhancing the understanding of cross domain identity matching capabilities is essential, not only for practical applications such as commercial strategies and cybersecurity measures, but also for theoretical insights into the privacy implications of data disclosure. In this study, we demonstrate that individual temporal data, in the form of inter-event times distribution, constitutes an individual temporal fingerprint, allowing for matching profiles across different domains back to their associated real-world entity. We evaluate our methodology on encrypted digital trading platforms within the Ethereum Blockchain and present impressing results in matching identities across these privacy-preserving domains, while outperforming previously suggested models. Our findings indicate that simply knowing when an individual is active, even if information about who they talk to and what they discuss is lacking, poses risks to users' privacy, highlighting the inherent challenges in preserving privacy in today's digital landscape. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2404.14643 [pdf, other]

Teaching Network Traffic Matrices in an Interactive Game Environment

Authors: Chasen Milner, Hayden Jananthan, Jeremy Kepner, Vijay Gadepally, Michael Jones, Peter Michaleas, Ritesh Patel, Sandeep Pisharody, Gabriel Wachman, Alex Pentland

Abstract: The Internet has become a critical domain for modern society that requires ongoing efforts for its improvement and protection. Network traffic matrices are a powerful tool for understanding and analyzing networks and are broadly taught in online graph theory educational resources. Network traffic matrix concepts are rarely available in online computer network and cybersecurity educational resource… ▽ More The Internet has become a critical domain for modern society that requires ongoing efforts for its improvement and protection. Network traffic matrices are a powerful tool for understanding and analyzing networks and are broadly taught in online graph theory educational resources. Network traffic matrix concepts are rarely available in online computer network and cybersecurity educational resources. To fill this gap, an interactive game environment has been developed to teach the foundations of traffic matrices to the computer networking community. The game environment provides a convenient, broadly accessible, delivery mechanism that enables making material available rapidly to a wide audience. The core architecture of the game is a facility to add new network traffic matrix training modules via an easily editable JSON file. Using this facility an initial set of modules were rapidly created covering: basic traffic matrices, traffic patterns, security/defense/deterrence, a notional cyber attack, a distributed denial-of-service (DDoS) attack, and a variety of graph theory concepts. The game environment enables delivery in a wide range of contexts to enable rapid feedback and improvement. The game can be used as a core unit as part of a formal course or as a simple interactive introduction in a presentation. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 9 pages, 10 figures, 52 references; accepted to IEEE GrAPL

arXiv:2402.17019 [pdf, other]

Leveraging Large Language Models for Learning Complex Legal Concepts through Storytelling

Authors: Hang Jiang, Xiajie Zhang, Robert Mahari, Daniel Kessler, Eric Ma, Tal August, Irene Li, Alex 'Sandy' Pentland, Yoon Kim, Deb Roy, Jad Kabbara

Abstract: Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts throug… ▽ More Making legal knowledge accessible to non-experts is crucial for enhancing general legal literacy and encouraging civic participation in democracy. However, legal documents are often challenging to understand for people without legal backgrounds. In this paper, we present a novel application of large language models (LLMs) in legal education to help non-experts learn intricate legal concepts through storytelling, an effective pedagogical tool in conveying complex and abstract concepts. We also introduce a new dataset LegalStories, which consists of 294 complex legal doctrines, each accompanied by a story and a set of multiple-choice questions generated by LLMs. To construct the dataset, we experiment with various LLMs to generate legal stories explaining these concepts. Furthermore, we use an expert-in-the-loop approach to iteratively design multiple-choice questions. Then, we evaluate the effectiveness of storytelling with LLMs through randomized controlled trials (RCTs) with legal novices on 10 samples from the dataset. We find that LLM-generated stories enhance comprehension of legal concepts and interest in law among non-native speakers compared to only definitions. Moreover, stories consistently help participants relate legal concepts to their lives. Finally, we find that learning with stories shows a higher retention rate for non-native speakers in the follow-up assessment. Our work has strong implications for using LLMs in promoting teaching and learning in the legal field and beyond. △ Less

Submitted 2 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted to ACL 2024

arXiv:2402.02675 [pdf, other]

Verifiable evaluations of machine learning models using zkSNARKs

Authors: Tobin South, Alexander Camuto, Shrey Jain, Shayla Nguyen, Robert Mahari, Christian Paquin, Jason Morton, Alex 'Sandy' Pentland

Abstract: In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presen… ▽ More In a world of increasing closed-source commercial machine learning models, model evaluations from developers must be taken at face value. These benchmark results-whether over task accuracy, bias evaluations, or safety checks-are traditionally impossible to verify by a model end-user without the costly or impossible process of re-performing the benchmark on black-box model outputs. This work presents a method of verifiable model evaluation using model inference through zkSNARKs. The resulting zero-knowledge computational proofs of model outputs over datasets can be packaged into verifiable evaluation attestations showing that models with fixed private weights achieve stated performance or fairness metrics over public inputs. We present a flexible proving system that enables verifiable attestations to be performed on any standard neural network model with varying compute requirements. For the first time, we demonstrate this across a sample of real-world models and highlight key challenges and design solutions. This presents a new transparency paradigm in the verifiable evaluation of private models. △ Less

Submitted 22 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

MSC Class: 68T01

arXiv:2312.14158 [pdf, other]

Data Cooperatives for Identity Attestations

Authors: Thomas Hardjono, Alex Pentland

Abstract: Data cooperatives with fiduciary obligations to members provide a useful source of truthful information regarding a given member whose personal data is managed by the cooperative. Since one of the main propositions the cooperative model is to protect the data privacy of members, we explore the notion of blinded attestations in which the identity of the subject is removed from the attestations issu… ▽ More Data cooperatives with fiduciary obligations to members provide a useful source of truthful information regarding a given member whose personal data is managed by the cooperative. Since one of the main propositions the cooperative model is to protect the data privacy of members, we explore the notion of blinded attestations in which the identity of the subject is removed from the attestations issued by the cooperative regarding one of its members. This is performed at the request of the individual member. We propose the use of a legal entity to countersign the blinded attestation, one that has an attorney-client relationship with the cooperative, and which can henceforth become the legal point of contact for inquiries regarding the individual related to the attribute being attested. There are several use-cases for this feature, including the Funds Travel Rule in transactions in digital assets, and the protection of privacy in decentralized social networks. △ Less

Submitted 29 October, 2023; originally announced December 2023.

Comments: 15 pages, 5 figures

arXiv:2311.18108 [pdf, other]

Behavior-based dependency networks between places shape urban economic resilience

Authors: Takahiro Yabe, Bernardo Garcia Bulle Bueno, Morgan Frank, Alex Pentland, Esteban Moro

Abstract: Urban economic resilience is intricately linked to how disruptions caused by pandemics, disasters, and technological shifts ripple through businesses and urban amenities. Disruptions, such as closures of non-essential businesses during the COVID-19 pandemic, not only affect those places directly but also influence how people live and move, spreading the impact on other businesses and increasing th… ▽ More Urban economic resilience is intricately linked to how disruptions caused by pandemics, disasters, and technological shifts ripple through businesses and urban amenities. Disruptions, such as closures of non-essential businesses during the COVID-19 pandemic, not only affect those places directly but also influence how people live and move, spreading the impact on other businesses and increasing the overall economic shock. However, it is unclear how much businesses depend on each other in these situations. Leveraging large-scale human mobility data and millions of same-day visits in New York, Boston, Los Angeles, Seattle, and Dallas, we quantify dependencies between points-of-interest (POIs) encompassing businesses, stores, and amenities. Compared to places' physical proximity, dependency networks computed from human mobility exhibit significantly higher rates of long-distance connections and biases towards specific pairs of POI categories. We show that using behavior-based dependency relationships improves the predictability of business resilience during shocks, such as the COVID-19 pandemic, by around 40% compared to distance-based models. Simulating hypothetical urban shocks reveals that neglecting behavior-based dependencies can lead to a substantial underestimation of the spatial cascades of disruptions on businesses and urban amenities. Our findings underscore the importance of measuring the complex relationships woven through behavioral patterns in human mobility to foster urban economic resilience to shocks. △ Less

Submitted 3 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.13008 [pdf, other]

zkTax: A pragmatic way to support zero-knowledge tax disclosures

Authors: Alex Berke, Tobin South, Robert Mahari, Kent Larson, Alex Pentland

Abstract: Tax returns contain key financial information of interest to third parties: public officials are asked to share financial data for transparency, companies seek to assess the financial status of business partners, and individuals need to prove their income to landlords or to receive benefits. Tax returns also contain sensitive data such that sharing them in their entirety undermines privacy. We int… ▽ More Tax returns contain key financial information of interest to third parties: public officials are asked to share financial data for transparency, companies seek to assess the financial status of business partners, and individuals need to prove their income to landlords or to receive benefits. Tax returns also contain sensitive data such that sharing them in their entirety undermines privacy. We introduce a zero-knowledge tax disclosure system (zkTax) that allows individuals and organizations to make provable claims about select information in their tax returns without revealing additional information, which can be independently verified by third parties. The system consists of three distinct services that can be distributed: a tax authority provides tax documents signed with a public key; a Redact & Prove Service enables users to produce a redacted version of the tax documents with a zero-knowledge proof attesting the provenance of the redacted data; a Verify Service enables anyone to verify the proof. We implement a prototype with a user interface, compatible with U.S. tax forms, and demonstrate how this design could be implemented with minimal changes to existing tax infrastructure. Our system is designed to be extensible to other contexts and jurisdictions. This work provides a practical example of how distributed tools leveraging cryptography can enhance existing government or financial infrastructures, providing immediate transparency alongside privacy without system overhauls. △ Less

Submitted 24 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.12955 [pdf, other]

Don't forget private retrieval: distributed private similarity search for large language models

Authors: Guy Zyskind, Tobin South, Alex Pentland

Abstract: While the flexible capabilities of large language models (LLMs) allow them to answer a range of queries based on existing learned knowledge, information retrieval to augment generation is an important tool to allow LLMs to answer questions on information not included in pre-training data. Such private information is increasingly being generated in a wide array of distributed contexts by organizati… ▽ More While the flexible capabilities of large language models (LLMs) allow them to answer a range of queries based on existing learned knowledge, information retrieval to augment generation is an important tool to allow LLMs to answer questions on information not included in pre-training data. Such private information is increasingly being generated in a wide array of distributed contexts by organizations and individuals. Performing such information retrieval using neural embeddings of queries and documents always leaked information about queries and database content unless both were stored locally. We present Private Retrieval Augmented Generation (PRAG), an approach that uses multi-party computation (MPC) to securely transmit queries to a distributed set of servers containing a privately constructed database to return top-k and approximate top-k documents. This is a first-of-its-kind approach to dense information retrieval that ensures no server observes a client's query or can see the database content. The approach introduces a novel MPC friendly protocol for inverted file approximate search (IVF) that allows for fast document search over distributed and private data in sublinear communication complexity. This work presents new avenues through which data for use in LLMs can be accessed and used without needing to centralize or forgo privacy. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.09356 [pdf, other]

LePaRD: A Large-Scale Dataset of Judges Citing Precedents

Authors: Robert Mahari, Dominik Stammbach, Elliott Ash, Alex `Sandy' Pentland

Abstract: We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a lega… ▽ More We present the Legal Passage Retrieval Dataset LePaRD. LePaRD is a massive collection of U.S. federal judicial citations to precedent in context. The dataset aims to facilitate work on legal passage prediction, a challenging practice-oriented legal retrieval and reasoning task. Legal passage prediction seeks to predict relevant passages from precedential court decisions given the context of a legal argument. We extensively evaluate various retrieval approaches on LePaRD, and find that classification appears to work best. However, we note that legal precedent prediction is a difficult task, and there remains significant room for improvement. We hope that by publishing LePaRD, we will encourage others to engage with a legal NLP task that promises to help expand access to justice by reducing the burden associated with legal research. A subset of the LePaRD dataset is freely available and the whole dataset will be released upon publication. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.14346 [pdf, other]

The Law and NLP: Bridging Disciplinary Disconnects

Authors: Robert Mahari, Dominik Stammbach, Elliott Ash, Alex 'Sandy' Pentland

Abstract: Legal practice is intrinsically rooted in the fabric of language, yet legal practitioners and scholars have been slow to adopt tools from natural language processing (NLP). At the same time, the legal system is experiencing an access to justice crisis, which could be partially alleviated with NLP. In this position paper, we argue that the slow uptake of NLP in legal practice is exacerbated by a di… ▽ More Legal practice is intrinsically rooted in the fabric of language, yet legal practitioners and scholars have been slow to adopt tools from natural language processing (NLP). At the same time, the legal system is experiencing an access to justice crisis, which could be partially alleviated with NLP. In this position paper, we argue that the slow uptake of NLP in legal practice is exacerbated by a disconnect between the needs of the legal community and the focus of NLP researchers. In a review of recent trends in the legal NLP literature, we find limited overlap between the legal NLP community and legal academia. Our interpretation is that some of the most popular legal NLP tasks fail to address the needs of legal practitioners. We discuss examples of legal NLP tasks that promise to bridge disciplinary disconnects and highlight interesting areas for legal NLP research that remain underexplored. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.00522 [pdf, other]

Map** of Internet "Coastlines" via Large Scale Anonymized Network Source Correlations

Authors: Hayden Jananthan, Jeremy Kepner, Michael Jones, William Arcand, David Bestor, William Bergeron, Chansup Byun, Timothy Davis, Vijay Gadepally, Daniel Grant, Michael Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Guillermo Morales, Andrew Morris, Julie Mullen, Ritesh Patel, Alex Pentland, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Tyler Trigg , et al. (3 additional authors not shown)

Abstract: Expanding the scientific tools available to protect computer networks can be aided by a deeper understanding of the underlying statistical distributions of network traffic and their potential geometric interpretations. Analyses of large scale network observations provide a unique window into studying those underlying statistics. Newly developed GraphBLAS hypersparse matrices and D4M associative ar… ▽ More Expanding the scientific tools available to protect computer networks can be aided by a deeper understanding of the underlying statistical distributions of network traffic and their potential geometric interpretations. Analyses of large scale network observations provide a unique window into studying those underlying statistics. Newly developed GraphBLAS hypersparse matrices and D4M associative array technologies enable the efficient anonymized analysis of network traffic on the scale of trillions of events. This work analyzes over 100,000,000,000 anonymized packets from the largest Internet telescope (CAIDA) and over 10,000,000 anonymized sources from the largest commercial honeyfarm (GreyNoise). Neither CAIDA nor GreyNoise actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Analysis of these observations confirms the previously observed Cauchy-like distributions describing temporal correlations between Internet sources. The Gull lighthouse problem is a well-known geometric characterization of the standard Cauchy distribution and motivates a potential geometric interpretation for Internet observations. This work generalizes the Gull lighthouse problem to accommodate larger classes of coastlines, deriving a closed-form solution for the resulting probability distributions, stating and examining the inverse problem of identifying an appropriate coastline given a continuous probability distribution, identifying a geometric heuristic for solving this problem computationally, and applying that heuristic to examine the temporal geometry of different subsets of network observations. Application of this method to the CAIDA and GreyNoise data reveals a several orders of magnitude difference between known benign and other traffic which can lead to potentially novel ways to protect networks. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 9 pages, 7 figures, IEEE HPEC 2023 (accepted)

arXiv:2309.01806 [pdf, other]

doi 10.1109/HPEC58863.2023.10363471

Focusing and Calibration of Large Scale Network Sensors using GraphBLAS Anonymized Hypersparse Matrices

Authors: Jeremy Kepner, Michael Jones, Phil Dykstra, Chansup Byun, Timothy Davis, Hayden Jananthan, William Arcand, David Bestor, William Bergeron, Vijay Gadepally, Micheal Houle, Matthew Hubbell, Anna Klein, Lauren Milechin, Guillermo Morales, Julie Mullen, Ritesh Patel, Alex Pentland, Sandeep Pisharody, Andrew Prout, Albert Reuther, Antonio Rosa, Siddharth Samsi, Tyler Trigg, Charles Yee , et al. (1 additional authors not shown)

Abstract: Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibrati… ▽ More Defending community-owned cyber space requires community-based efforts. Large-scale network observations that uphold the highest regard for privacy are key to protecting our shared cyberspace. Deployment of the necessary network sensors requires careful sensor placement, focusing, and calibration with significant volumes of network observations. This paper demonstrates novel focusing and calibration procedures on a multi-billion packet dataset using high-performance GraphBLAS anonymized hypersparse matrices. The run-time performance on a real-world data set confirms previously observed real-time processing rates for high-bandwidth links while achieving significant data compression. The output of the analysis demonstrates the effectiveness of these procedures at focusing the traffic matrix and revealing the underlying stable heavy-tail statistical distributions that are necessary for anomaly detection. A simple model of the corresponding probability of detection ($p_{\rm d}$) and probability of false alarm ($p_{\rm fa}$) for these distributions highlights the criticality of network sensor focusing and calibration. Once a sensor is properly focused and calibrated it is then in a position to carry out two of the central tenets of good cybersecurity: (1) continuous observation of the network and (2) minimizing unbrokered network connections. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE HPEC, 9 pages, 12 figures, 1 table, 63 references, 2 appendices

arXiv:2308.07058 [pdf, other]

Temporal clustering of social interactions trades-off disease spreading and knowledge diffusion

Authors: Giulia Cencetti, Lorenzo Lucchini, Gabriele Santin, Federico Battiston, Esteban Moro, Alex Pentland, Bruno Lepri

Abstract: Non-pharmaceutical measures such as preventive quarantines, remote working, school and workplace closures, lockdowns, etc. have shown effectivenness from an epidemic control perspective; however they have also significant negative consequences on social life and relationships, work routines, and community engagement. In particular, complex ideas, work and school collaborations, innovative discover… ▽ More Non-pharmaceutical measures such as preventive quarantines, remote working, school and workplace closures, lockdowns, etc. have shown effectivenness from an epidemic control perspective; however they have also significant negative consequences on social life and relationships, work routines, and community engagement. In particular, complex ideas, work and school collaborations, innovative discoveries, and resilient norms formation and maintenance, which often require face-to-face interactions of two or more parties to be developed and synergically coordinated, are particularly affected. In this study, we propose an alternative hybrid solution that balances the slowdown of epidemic diffusion with the preservation of face-to-face interactions. Our approach involves a two-step partitioning of the population. First, we tune the level of node clustering, creating "social bubbles" with increased contacts within each bubble and fewer outside, while maintaining the average number of contacts in each network. Second, we tune the level of temporal clustering by pairing, for a certain time interval, nodes from specific social bubbles. Our results demonstrate that a hybrid approach can achieve better trade-offs between epidemic control and complex knowledge diffusion. The versatility of our model enables tuning and refining clustering levels to optimally achieve the desired trade-off, based on the potentially changing characteristics of a disease or knowledge diffusion process. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.03401 [pdf, other]

Metropolitan Scale and Longitudinal Dataset of Anonymized Human Mobility Trajectories

Authors: Takahiro Yabe, Kota Tsubouchi, Toru Shimizu, Yoshihide Sekimoto, Kaoru Sezaki, Esteban Moro, Alex Pentland

Abstract: Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-sour… ▽ More Modeling and predicting human mobility trajectories in urban areas is an essential task for various applications. The recent availability of large-scale human movement data collected from mobile devices have enabled the development of complex human mobility prediction models. However, human mobility prediction methods are often trained and tested on different datasets, due to the lack of open-source large-scale human mobility datasets amid privacy concerns, posing a challenge towards conducting fair performance comparisons between methods. To this end, we created an open-source, anonymized, metropolitan scale, and longitudinal (90 days) dataset of 100,000 individuals' human mobility trajectories, using mobile phone location data. The location **s are spatially and temporally discretized, and the metropolitan area is undisclosed to protect users' privacy. The 90-day period is composed of 75 days of business-as-usual and 15 days during an emergency. To promote the use of the dataset, we will host a human mobility prediction data challenge (`HuMob Challenge 2023') using the human mobility dataset, which will be held in conjunction with ACM SIGSPATIAL 2023. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Data descriptor for the Human Mobility Prediction Challenge (HuMob Challenge) 2023

arXiv:2306.13723 [pdf, other]

Human-AI Coevolution

Authors: Dino Pedreschi, Luca Pappalardo, Emanuele Ferragina, Ricardo Baeza-Yates, Albert-Laszlo Barabasi, Frank Dignum, Virginia Dignum, Tina Eliassi-Rad, Fosca Giannotti, Janos Kertesz, Alistair Knott, Yannis Ioannidis, Paul Lukowicz, Andrea Passarella, Alex Sandy Pentland, John Shawe-Taylor, Alessandro Vespignani

Abstract: Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online pla… ▽ More Human-AI coevolution, defined as a process in which humans and AI algorithms continuously influence each other, increasingly characterises our society, but is understudied in artificial intelligence and complexity science literature. Recommender systems and assistants play a prominent role in human-AI coevolution, as they permeate many facets of daily life and influence human choices on online platforms. The interaction between users and AI results in a potentially endless feedback loop, wherein users' choices generate data to train AI models, which, in turn, shape subsequent user preferences. This human-AI feedback loop has peculiar characteristics compared to traditional human-machine interaction and gives rise to complex and often ``unintended'' social outcomes. This paper introduces Coevolution AI as the cornerstone for a new field of study at the intersection between AI and complexity science focused on the theoretical, empirical, and mathematical investigation of the human-AI feedback loop. In doing so, we: (i) outline the pros and cons of existing methodologies and highlight shortcomings and potential ways for capturing feedback loop mechanisms; (ii) propose a reflection at the intersection between complexity science, AI and society; (iii) provide real-world examples for different human-AI ecosystems; and (iv) illustrate challenges to the creation of such a field of study, conceptualising them at increasing levels of abstraction, i.e., technical, epistemological, legal and socio-political. △ Less

Submitted 3 May, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.04141 [pdf, other]

doi 10.1126/science.adh4451

Art and the science of generative AI: A deeper dive

Authors: Ziv Epstein, Aaron Hertzmann, Laura Herman, Robert Mahari, Morgan R. Frank, Matthew Groh, Hope Schroeder, Amy Smith, Memo Akten, Jessica Fjeld, Hany Farid, Neil Leach, Alex Pentland, Olga Russakovsky

Abstract: A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of… ▽ More A new class of tools, colloquially called generative AI, can produce high-quality artistic media for visual arts, concept art, music, fiction, literature, video, and animation. The generative capabilities of these tools are likely to fundamentally alter the creative processes by which creators formulate ideas and put them into production. As creativity is reimagined, so too may be many sectors of society. Understanding the impact of generative AI - and making policy decisions around it - requires new interdisciplinary scientific inquiry into culture, economics, law, algorithms, and the interaction of technology and creativity. We argue that generative AI is not the harbinger of art's demise, but rather is a new medium with its own distinct affordances. In this vein, we consider the impacts of this new medium on creators across four themes: aesthetics and culture, legal questions of ownership and credit, the future of creative work, and impacts on the contemporary media ecosystem. Across these themes, we highlight key research questions and directions to inform policy and beneficial uses of the technology. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: This white paper is an expanded version of Epstein et al 2023 published in Science Perspectives on July 16, 2023 which you can find at the following DOI: 10.1126/science.adh4451

arXiv:2212.00869 [pdf, other]

Flexible social inference facilitates targeted social learning when rewards are not observable

Authors: Robert D. Hawkins, Andrew M. Berdahl, Alex "Sandy" Pentland, Joshua B. Tenenbaum, Noah D. Goodman, P. M. Krafft

Abstract: Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable… ▽ More Groups coordinate more effectively when individuals are able to learn from others' successes. But acquiring such knowledge is not always easy, especially in real-world environments where success is hidden from public view. We suggest that social inference capacities may help bridge this gap, allowing individuals to update their beliefs about others' underlying knowledge and success from observable trajectories of behavior. We compared our social inference model against simpler heuristics in three studies of human behavior in a collective sensing task. In Experiment 1, we found that average performance improves as a function of group size at a rate greater than predicted by non-inferential models. Experiment 2 introduced artificial agents to evaluate how individuals selectively rely on social information. Experiment 3 generalized these findings to a more complex reward landscape. Taken together, our findings provide insight into the relationship between individual social cognition and the flexibility of collective behavior. △ Less

Submitted 5 August, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: Nature Human Behaviour

arXiv:2210.11053 [pdf, other]

The Network Structure of Unequal Diffusion

Authors: Eaman Jahani, Dean Eckles, Alex 'Sandy' Pentland

Abstract: Social networks affect the diffusion of information, and thus have the potential to reduce or amplify inequality in access to opportunity. We show empirically that social networks often exhibit a much larger potential for unequal diffusion across groups along paths of length 2 and 3 than expected by our random graph models. We argue that homophily alone cannot not fully explain the extent of unequ… ▽ More Social networks affect the diffusion of information, and thus have the potential to reduce or amplify inequality in access to opportunity. We show empirically that social networks often exhibit a much larger potential for unequal diffusion across groups along paths of length 2 and 3 than expected by our random graph models. We argue that homophily alone cannot not fully explain the extent of unequal diffusion and attribute this mismatch to unequal distribution of cross-group links among the nodes. Based on this insight, we develop a variant of the stochastic block model that incorporates the heterogeneity in cross-group linking. The model provides an unbiased and consistent estimate of assortativity or homophily on paths of length 2 and provide a more accurate estimate along paths of length 3 than existing models. We characterize the null distribution of its log-likelihood ratio test and argue that the goodness of fit test is valid only when the network is dense. Based on our empirical observations and modeling results, we conclude that the impact of any departure from equal distribution of links to source nodes in the diffusion process is not limited to its first order effects as some nodes will have fewer direct links to the sources. More importantly, this unequal distribution will also lead to second order effects as the whole group will have fewer diffusion paths to the sources. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 47 pages

arXiv:2210.04641 [pdf, other]

doi 10.1057/s41599-024-02881-1

One City, Two Tales: Using Mobility Networks to Understand Neighborhood Resilience and Fragility during the COVID-19 Pandemic

Authors: Hasan Alp Boz, Mohsen Bahrami, Selim Balcisoy, Burcin Bozkaya, Nina Mazar, Aaron Nichols, Alex Pentland

Abstract: What predicts a neighborhood's resilience and adaptability to essential public health policies and shelter-in-place regulations that prevent the harmful spread of COVID-19? To answer this question, in this paper we present a novel application of human mobility patterns and human behavior in a network setting. We analyze mobility data in New York City over two years, from January 2019 to December 2… ▽ More What predicts a neighborhood's resilience and adaptability to essential public health policies and shelter-in-place regulations that prevent the harmful spread of COVID-19? To answer this question, in this paper we present a novel application of human mobility patterns and human behavior in a network setting. We analyze mobility data in New York City over two years, from January 2019 to December 2020, and create weekly mobility networks between Census Block Groups by aggregating Point of Interest level visit patterns. Our results suggest that both the socioeconomic and geographic attributes of neighborhoods significantly predict neighborhood adaptability to the shelter-in-place policies active at that time. That is, our findings and simulation results reveal that in addition to factors such as race, education, and income, geographical attributes such as access to amenities in a neighborhood that satisfy community needs were equally important factors for predicting neighborhood adaptability and the spread of COVID-19. The results of our study provide insights that can enhance urban planning strategies that contribute to pandemic alleviation efforts, which in turn may help urban areas become more resilient to exogenous shocks such as the COVID-19 pandemic. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.01927 [pdf, other]

doi 10.1007/978-3-031-43129-6_6

Building a healthier feed: Private location trace intersection driven feed recommendations

Authors: Tobin South, Nick Lothian, Alex "Sandy" Pentland

Abstract: The physical environment you navigate strongly determines which communities and people matter most to individuals. These effects drive both personal access to opportunities and the social capital of communities, and can often be observed in the personal mobility traces of individuals. Traditional social media feeds underutilize these mobility-based features, or do so in a privacy exploitative mann… ▽ More The physical environment you navigate strongly determines which communities and people matter most to individuals. These effects drive both personal access to opportunities and the social capital of communities, and can often be observed in the personal mobility traces of individuals. Traditional social media feeds underutilize these mobility-based features, or do so in a privacy exploitative manner. Here we propose a consent-first private information sharing paradigm for driving social feeds from users' personal private data, specifically using mobility traces. This approach designs the feed to explicitly optimize for integrating the user into the local community and for social capital building through leveraging mobility trace overlaps as a proxy for existing or potential real-world social connections, creating proportionality between whom a user sees in their feed, and whom the user is likely to see in person. These claims are validated against existing social-mobility data, and a reference implementation of the proposed algorithm is built for demonstration. In total, this work presents a novel technique for designing feeds that represent real offline social connections through private set intersections requiring no third party, or public data exposure. △ Less

Submitted 20 September, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Journal ref: Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2023. Lecture Notes in Computer Science, vol 14161. Springer, Cham

arXiv:2209.12095 [pdf, other]

Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics

Authors: Yanni Yang, Alex Pentland, Esteban Moro

Abstract: Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million peo… ▽ More Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million people to 1.1 million places in 11 metro areas in the U.S. to detect the latent mobility behaviors and lifestyles in the largest American cities. Despite the considerable complexity of mobility visitations, we found that lifestyles can be automatically decomposed into only 12 latent interpretable activity behaviors on how people combine shop**, eating, working, or using their free time. Rather than describing individuals with a single lifestyle, we find that city dwellers' behavior is a mixture of those behaviors. Those detected latent activity behaviors are equally present across cities and cannot be fully explained by main demographic features. Finally, we find those latent behaviors are associated with dynamics like experienced income segregation, transportation, or healthy behaviors in cities, even after controlling for demographic features. Our results signal the importance of complementing traditional census data with activity behaviors to understand urban dynamics. △ Less

Submitted 24 September, 2022; originally announced September 2022.

Comments: 18 pages, 7 figures

arXiv:2209.07041 [pdf, other]

Diversity beyond density: experienced social mixing of urban streets

Authors: Zhuangyuan Fan, Tianyu Su, Maoran Sun, Ariel Noyman, Fan Zhang, Alex Sandy Pentland, Esteban Moro

Abstract: Urban density, in the form of residents' and visitors' concentration, is long considered to foster diverse exchanges of interpersonal knowledge and skills, which are intrinsic to sustainable human settlements. However, with current urban studies primarily devoted to city and district-level analysis, we cannot unveil the elemental connection between urban density and diversity. Here we use an anony… ▽ More Urban density, in the form of residents' and visitors' concentration, is long considered to foster diverse exchanges of interpersonal knowledge and skills, which are intrinsic to sustainable human settlements. However, with current urban studies primarily devoted to city and district-level analysis, we cannot unveil the elemental connection between urban density and diversity. Here we use an anonymized and privacy-enhanced mobile data set of 0.5 million opted-in users from three metropolitan areas in the U.S to show that at the scale of urban streets, density is not the only path to diversity. We represent the diversity of each street with the Experienced Social Mixing (ESM), which describes the chances of people meeting diverse income groups throughout their daily experience. We conduct multiple experiments and show that the concentration of visitors only explains 26% of street-level ESM. However, adjacent amenities, residential diversity, and income level account for 44% of the ESM. Moreover, using longitudinal business data, we show that streets with an increased number of food businesses have seen an increased ESM from 2016 to 2018. Lastly, although streets with more visitors are more likely to have crime, diverse streets tend to have fewer crimes. These findings suggest that cities can leverage many tools beyond density to curate a diverse and safe street experience for people. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2207.06895 [pdf, other]

doi 10.1038/s41467-023-37913-y

Behavioral changes during the pandemic worsened income diversity of urban encounters

Authors: Takahiro Yabe, Bernardo Garcia Bulle Bueno, Xiaowen Dong, Alex `Sandy' Pentland, Esteban Moro

Abstract: Diversity of physical encounters and social interactions in urban environments are known to spur economic productivity and innovation in cities, while also to foster social capital and resilience of communities. However, mobility restrictions during the pandemic have forced people to substantially reduce urban physical encounters, raising questions on the social implications of such behavioral cha… ▽ More Diversity of physical encounters and social interactions in urban environments are known to spur economic productivity and innovation in cities, while also to foster social capital and resilience of communities. However, mobility restrictions during the pandemic have forced people to substantially reduce urban physical encounters, raising questions on the social implications of such behavioral changes. In this paper, we study how the income diversity of urban encounters have changed during different periods throughout the pandemic, using a large-scale, privacy-enhanced mobility dataset of more than one million anonymized mobile phone users in four large US cities, collected across three years spanning before and during the pandemic. We find that the diversity of urban encounters have substantially decreased (by 15% to 30%) during the pandemic and has persisted through late 2021, even though aggregated mobility metrics have recovered to pre-pandemic levels. Counterfactual analyses show that while the reduction of outside activities (higher rates of staying at home) was a major factor that contributed to decreased diversity in the early stages of the pandemic, behavioral changes including lower willingness to explore new places and changes in visitation preferences further worsened the long-term diversity of encounters. Our findings suggest that the pandemic could have long-lasting negative effects on urban income diversity, and provide implications for managing the trade-off between the stringency of COVID-19 policies and the diversity of urban encounters as we move beyond the pandemic. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: main: 13 pages, 4 figures; supplementary: 46 pages, 25 figures, 14 tables

arXiv:2207.03652 [pdf, other]

Private independence testing across two parties

Authors: Praneeth Vepakomma, Mohammad Mohammadi Amiri, Clément L. Canonne, Ramesh Raskar, Alex Pentland

Abstract: We introduce $π$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Székely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially… ▽ More We introduce $π$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Székely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially private test, which we believe will find applications in a variety of distributed hypothesis testing settings involving sensitive data. △ Less

Submitted 26 September, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.12915 [pdf, ps, other]

doi 10.36190/2021.51

Disambiguating Disinformation: Extending Beyond the Veracity of Online Content

Authors: Keeley Erhardt, Alex Pentland

Abstract: Following the 2016 US presidential election and the now overwhelming evidence of Russian interference, there has been an explosion of interest in the phenomenon of "fake news". To date, research on false news has centered around detecting content from low-credibility sources and analyzing how this content spreads across online platforms. Misinformation poses clear risks, yet research agendas that… ▽ More Following the 2016 US presidential election and the now overwhelming evidence of Russian interference, there has been an explosion of interest in the phenomenon of "fake news". To date, research on false news has centered around detecting content from low-credibility sources and analyzing how this content spreads across online platforms. Misinformation poses clear risks, yet research agendas that overemphasize veracity miss the opportunity to truly understand the Kremlin-led disinformation campaign that shook so many Americans. In this paper, we present a definition for disinformation - a set or sequence of orchestrated, agenda-driven information actions with the intent to deceive - that is useful in contextualizing Russian interference in 2016 and disinformation campaigns more broadly. We expand on our ongoing work to operationalize this definition and demonstrate how detecting disinformation must extend beyond assessing the credibility of a specific publisher, user, or story. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: In Workshop Proceedings of the 15th International AAAI Conference on Web and Social Media (2021)

arXiv:2205.14174 [pdf, other]

Private and Byzantine-Proof Cooperative Decision-Making

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit, while communicating over a network with delays. The central idea in this problem is to design algorithms that can efficiently leverage communication to obtain improvements over acting in isolation. In this paper, we investigate the stochastic bandit p… ▽ More The cooperative bandit problem is a multi-agent decision problem involving a group of agents that interact simultaneously with a multi-armed bandit, while communicating over a network with delays. The central idea in this problem is to design algorithms that can efficiently leverage communication to obtain improvements over acting in isolation. In this paper, we investigate the stochastic bandit problem under two settings - (a) when the agents wish to make their communication private with respect to the action sequence, and (b) when the agents can be byzantine, i.e., they provide (stochastically) incorrect information. For both these problem settings, we provide upper-confidence bound algorithms that obtain optimal regret while being (a) differentially-private and (b) tolerant to byzantine agents. Our decentralized algorithms require no information about the network of connectivity between agents, making them scalable to large dynamic systems. We test our algorithms on a competitive benchmark of random graphs and demonstrate their superior performance with respect to existing robust algorithms. We hope that our work serves as an important step towards creating distributed decision-making systems that maintain privacy. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: Full version of AAMAS 2020 paper uploaded to arXiv

arXiv:2201.07184 [pdf, other]

Are neighbourhood amenities associated with more walking and less driving? Yes, but only for the wealthy

Authors: Samuel Heroy, Isabella Loaiza, Alex Pentland, Neave O'Clery

Abstract: Cities are home to a vast array of amenities, from local barbers to science museums and shop** malls. But these are inequality distributed across urban space. Using Google Places data combined with trip-based mobility data for Bogotá, Colombia, we shed light on the impact of neighbourhood amenities on urban mobility patterns. Deriving a new accessibility metric that explicitly takes into account… ▽ More Cities are home to a vast array of amenities, from local barbers to science museums and shop** malls. But these are inequality distributed across urban space. Using Google Places data combined with trip-based mobility data for Bogotá, Colombia, we shed light on the impact of neighbourhood amenities on urban mobility patterns. Deriving a new accessibility metric that explicitly takes into account spatial range, we find that a higher density of local amenities is associated a higher likelihood of walking as well as shorter bus and car trips. Digging deeper, we use a sample stratification framework to show that socioeconomic status (SES) modulates these effects. Amenities within about a 1km radius are strongly associated with a higher propensity to walk and lower driving time only for only the wealthiest group. In contrast, a higher density of amenities is associated with shorter bus trips for low and middle SES residents. As cities globally aim to boost public transport and green travel, these findings enable us to better understand how commercial structure shapes urban mobility in highly income-segregated settings. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2201.06068 [pdf]

Zero Botnets: An Observe-Pursue-Counter Approach

Authors: Jeremy Kepner, Jonathan Bernays, Stephen Buckley, Kenjiro Cho, Cary Conrad, Leslie Daigle, Keeley Erhardt, Vijay Gadepally, Barry Greene, Michael Jones, Robert Knake, Bruce Maggs, Peter Michaleas, Chad Meiners, Andrew Morris, Alex Pentland, Sandeep Pisharody, Sarah Powazek, Andrew Prout, Philip Reiner, Koichi Suzuki, Kenji Takahashi, Tony Tauber, Leah Walker, Douglas Stetson

Abstract: Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the… ▽ More Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the presence of botnets on the Internet, with the aspirational target of zero, is a powerful vision for galvanizing policy action. Setting a global goal, encouraging international cooperation, creating incentives for improving networks, and supporting entities for botnet takedowns are among several policies that could advance this goal. These policies raise significant questions regarding proper authorities/access that cannot be answered in the abstract. Systems analysis has been widely used in other domains to achieve sufficient detail to enable these questions to be dealt with in concrete terms. Defeating botnets using an observe-pursue-counter architecture is analyzed, the technical feasibility is affirmed, and the authorities/access questions are significantly narrowed. Recommended next steps include: supporting the international botnet takedown community, expanding network observatories, enhancing the underlying network science at scale, conducting detailed systems analysis, and develo** appropriate policy frameworks. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: 26 pages, 13 figures, 2 tables, 72 references, submitted to PlosOne

Report number: Harvard Belfer Center Report (2021 June)

arXiv:2112.04766 [pdf, other]

Adaptive Methods for Aggregated Domain Generalization

Authors: Xavier Thomas, Dhruv Mahajan, Alex Pentland, Abhimanyu Dubey

Abstract: Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of… ▽ More Domain generalization involves learning a classifier from a heterogeneous collection of training sources such that it generalizes to data drawn from similar unknown target domains, with applications in large-scale learning and personalized inference. In many settings, privacy concerns prohibit obtaining domain labels for the training data samples, and instead only have an aggregated collection of training points. Existing approaches that utilize domain labels to create domain-invariant feature representations are inapplicable in this setting, requiring alternative approaches to learn generalizable classifiers. In this paper, we propose a domain-adaptive approach to this problem, which operates in two steps: (a) we cluster training data within a carefully chosen feature space to create pseudo-domains, and (b) using these pseudo-domains we learn a domain-adaptive classifier that makes predictions using information about both the input and the pseudo-domain it belongs to. Our approach achieves state-of-the-art performance on a variety of domain generalization benchmarks without using domain labels whatsoever. Furthermore, we provide novel theoretical guarantees on domain generalization using cluster information. Our approach is amenable to ensemble-based methods and provides substantial gains even on large-scale benchmark datasets. The code can be found at: https://github.com/xavierohan/AdaClust_DomainBed △ Less

Submitted 23 December, 2021; v1 submitted 9 December, 2021; originally announced December 2021.

arXiv:2111.12482 [pdf, other]

One More Step Towards Reality: Cooperative Bandits with Imperfect Communication

Authors: Udari Madhushani, Abhimanyu Dubey, Naomi Ehrich Leonard, Alex Pentland

Abstract: The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative ban… ▽ More The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance. △ Less

Submitted 24 November, 2021; originally announced November 2021.

Journal ref: Conference on Neural Information Processing Systems, 2021

arXiv:2109.10523 [pdf, other]

doi 10.1038/s42005-022-00863-w

Investigating and Modeling the Dynamics of Long Ties

Authors: Ding Lyu, Yuan Yuan, Lin Wang, Xiaofan Wang, Alex Pentland

Abstract: Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynami… ▽ More Long ties, the social ties that bridge different communities, are widely believed to play crucial roles in spreading novel information in social networks. However, some existing network theories and prediction models indicate that long ties might dissolve quickly or eventually become redundant, thus putting into question the long-term value of long ties. Our empirical analysis of real-world dynamic networks shows that contrary to such reasoning, long ties are more likely to persist than other social ties, and that many of them constantly function as social bridges without being embedded in local networks. Using a novel cost-benefit analysis model combined with machine learning, we show that long ties are highly beneficial, which instinctively motivates people to expend extra effort to maintain them. This partly explains why long ties are more persistent than what has been suggested by many existing theories and models. Overall, our study suggests the need for social interventions that can promote the formation of long ties, such as mixing people with diverse backgrounds. △ Less

Submitted 2 April, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Comments: Forthcoming at Communications Physics (Nature portfolio)

MSC Class: 05C85; 62P25; 91B16 ACM Class: J.4

Journal ref: Commun. Phys. 5 (2022) 87

arXiv:2108.07437 [pdf, other]

Social influence leads to the formation of diverse local trends

Authors: Ziv Epstein, Matthew Groh, Abhimanyu Dubey, Alex "Sandy" Pentland

Abstract: How does the visual design of digital platforms impact user behavior and the resulting environment? A body of work suggests that introducing social signals to content can increase both the inequality and unpredictability of its success, but has only been shown in the context of music listening. To further examine the effect of social influence on media popularity, we extend this research to the co… ▽ More How does the visual design of digital platforms impact user behavior and the resulting environment? A body of work suggests that introducing social signals to content can increase both the inequality and unpredictability of its success, but has only been shown in the context of music listening. To further examine the effect of social influence on media popularity, we extend this research to the context of algorithmically-generated images by re-adapting Salganik et al's Music Lab experiment. On a digital platform where participants discover and curate AI-generated hybrid animals, we randomly assign both the knowledge of other participants' behavior and the visual presentation of the information. We successfully replicate the Music Lab's findings in the context of images, whereby social influence leads to an unpredictable winner-take-all market. However, we also find that social influence can lead to the emergence of local cultural trends that diverge from the status quo and are ultimately more diverse. We discuss the implications of these results for platform designers and animal conservation efforts. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 18 pages, to appear in CSCW October 2021

ACM Class: J.4

arXiv:2103.15796 [pdf, other]

Adaptive Methods for Real-World Domain Generalization

Authors: Abhimanyu Dubey, Vignesh Ramanathan, Alex Pentland, Dhruv Mahajan

Abstract: Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we… ▽ More Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domain-adaptive model, that takes both the input as well as its domain into account while making predictions. For unseen domains, our method simply uses few unlabelled test examples to construct the domain embedding. This enables adaptive classification on any unseen domain. Our approach achieves state-of-the-art performance on various domain generalization benchmarks. In addition, we introduce the first real-world, large-scale domain generalization benchmark, Geo-YFCC, containing 1.1M samples over 40 training, 7 validation, and 15 test domains, orders of magnitude larger than prior work. We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains. In contrast, our approach achieves a significant improvement. △ Less

Submitted 29 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: To appear as an oral presentation in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. v2 corrects double printing of appendix

arXiv:2103.08131 [pdf]

What are the key components of an entrepreneurial ecosystem in a develo** economy? A longitudinal empirical study on technology business incubators in China

Authors: Xiangfei Yuan, Hai**g Hao, Chenghua Guan, Alex Pentland

Abstract: Since the 1980s, technology business incubators (TBIs), which focus on accelerating businesses through resource sharing, knowledge agglomeration, and technology innovation, have become a booming industry. As such, research on TBIs has gained international attention, most notably in the United States, Europe, Japan, and China. The present study proposes an entrepreneurial ecosystem framework with f… ▽ More Since the 1980s, technology business incubators (TBIs), which focus on accelerating businesses through resource sharing, knowledge agglomeration, and technology innovation, have become a booming industry. As such, research on TBIs has gained international attention, most notably in the United States, Europe, Japan, and China. The present study proposes an entrepreneurial ecosystem framework with four key components, i.e., people, technology, capital, and infrastructure, to investigate which factors have an impact on the performance of TBIs. We also empirically examine this framework based on unique, three-year panel survey data from 857 national TBIs across China. We implemented factor analysis and panel regression models on dozens of variables from 857 national TBIs between 2015 and 2017 in all major cities in China and found that a number of factors associated with people, technology, capital, and infrastructure components have various statistically significant impacts on the performance of TBIs at either national model or regional models. △ Less

Submitted 15 March, 2021; originally announced March 2021.

arXiv:2103.04972 [pdf, ps, other]

Provably Efficient Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this paper, we discuss the problem of cooperative multi-agent RL with function approximation, where a group of agents communicates with each other to joint… ▽ More Reinforcement learning in cooperative multi-agent settings has recently advanced significantly in its scope, with applications in cooperative estimation for advertising, dynamic treatment regimes, distributed control, and federated learning. In this paper, we discuss the problem of cooperative multi-agent RL with function approximation, where a group of agents communicates with each other to jointly solve an episodic MDP. We demonstrate that via careful message-passing and cooperative value iteration, it is possible to achieve near-optimal no-regret learning even with a fixed constant communication budget. Next, we demonstrate that even in heterogeneous cooperative settings, it is possible to achieve Pareto-optimal no-regret learning with limited communication. Our work generalizes several ideas from the multi-agent contextual and multi-armed bandit literature to MDPs and reinforcement learning. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 53 pages including Appendix

arXiv:2010.14630 [pdf, other]

doi 10.1098/rsif.2020.1035

COVID-19 policy analysis: labour structure dictates lockdown mobility behaviour

Authors: Samuel Heroy, Isabella Loaiza, Alexander Pentland, Neave O'Clery

Abstract: Countries and cities around the world have resorted to unprecedented mobility restrictions to combat Covid-19 transmission. Here we exploit a natural experiment whereby Colombian cities implemented varied lockdown policies based on ID number and gender to analyse the impact of these policies on urban mobility. Using mobile phone data, we find that the restrictiveness of cities' mobility quotas (th… ▽ More Countries and cities around the world have resorted to unprecedented mobility restrictions to combat Covid-19 transmission. Here we exploit a natural experiment whereby Colombian cities implemented varied lockdown policies based on ID number and gender to analyse the impact of these policies on urban mobility. Using mobile phone data, we find that the restrictiveness of cities' mobility quotas (the share of residents allowed out daily according to policy advice) does not correlate with mobility reduction Instead, we find that larger, wealthier cities with more formalized and complex industrial structure experienced greater reductions in mobility. Within cities, wealthier residents are more likely to reduce mobility, and commuters are especially more likely to stay home when their work is located in wealthy or commercially/industrially formalized neighbourhoods..Hence, our results indicate that cities' employment characteristics and work-from-home capabilities are the primary determinants of mobility reduction. This finding underscores the need for mitigations aimed at lower income/informal workers, and sheds light on critical dependencies between socioeconomic classes in Latin American cities. △ Less

Submitted 5 April, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

arXiv:2010.11425 [pdf, other]

Differentially-Private Federated Linear Bandits

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent pri… ▽ More The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise \textsc{FedUCB}, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well. Our algorithms provide competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings. △ Less

Submitted 21 October, 2020; originally announced October 2020.

Comments: 22 pages. Camera-ready for NeurIPS 2020

arXiv:2009.07413 [pdf, other]

Towards a Contract Service Provider Model for Virtual Assets and VASPs

Authors: Thomas Hardjono, Alexander Lipton, Alex Pentland

Abstract: We introduce the contract service provider (CSP) model as an analog of the successful Internet ISP model. Our exploration is motivated by the need to seek alternative blockchain service-fee models that departs from the token-for-operations (gas fee) model for smart contracts found on many popular blockchain platforms today. A given CSP community consisting of multiple CSP business entities (VASPs)… ▽ More We introduce the contract service provider (CSP) model as an analog of the successful Internet ISP model. Our exploration is motivated by the need to seek alternative blockchain service-fee models that departs from the token-for-operations (gas fee) model for smart contracts found on many popular blockchain platforms today. A given CSP community consisting of multiple CSP business entities (VASPs) form a contract domain which implement well-defined contract primitives, policies and contract-ledger. The nodes of the members of CSP community form the blockchain network. We discuss a number of design principles borrowed from the design principles of the Internet Architecture, and we discuss the interoperability of cross-domain (cross-chain) transfers of virtual assets in the context of contract domains. △ Less

Submitted 6 December, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

Comments: 33 pages, 8 figures

arXiv:2008.06244 [pdf, other]

Cooperative Multi-Agent Bandits with Heavy Tails

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not le… ▽ More We study the heavy-tailed stochastic bandit problem in the cooperative multi-agent setting, where a group of agents interact with a common bandit problem, while communicating on a network with delays. Existing algorithms for the stochastic bandit in this setting utilize confidence intervals arising from an averaging-based communication protocol known as~\textit{running consensus}, that does not lend itself to robust estimation for heavy-tailed settings. We propose \textsc{MP-UCB}, a decentralized multi-agent algorithm for the cooperative stochastic bandit that incorporates robust estimation with a message-passing protocol. We prove optimal regret bounds for \textsc{MP-UCB} for several problem settings, and also demonstrate its superiority to existing methods. Furthermore, we establish the first lower bounds for the cooperative bandit problem, in addition to providing efficient algorithms for robust bandit estimation of location. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: 26 pages including appendix, camera-ready for ICML 2020

arXiv:2008.06220 [pdf, other]

Kernel Methods for Cooperative Multi-Agent Contextual Bandits

Authors: Abhimanyu Dubey, Alex Pentland

Abstract: Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must… ▽ More Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must cooperate to collectively solve their unique decision problems. For this problem, we propose \textsc{Coop-KernelUCB}, an algorithm that provides near-optimal bounds on the per-agent regret, and is both computationally and communicatively efficient. For special cases of the cooperative problem, we also provide variants of \textsc{Coop-KernelUCB} that provides optimal per-agent regret. In addition, our algorithm generalizes several existing results in the multi-agent bandit setting. Finally, on a series of both synthetic and real-world multi-agent network benchmarks, we demonstrate that our algorithm significantly outperforms existing benchmarks. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: 19 pages including supplement, camera-ready at ICML 2020

arXiv:2007.09505 [pdf, other]

Social Learning and the Accuracy-Risk Trade-off in the Wisdom of the Crowd

Authors: Dhaval Adjodah, Yan Leng, Shi Kai Chong, P. M. Krafft, Esteban Moro, Alex Pentland

Abstract: How do we design and deploy crowdsourced prediction platforms for real-world applications where risk is an important dimension of prediction performance? To answer this question, we conducted a large online Wisdom of the Crowd study where participants predicted the prices of real financial assets (e.g. S&P 500). We observe a Pareto frontier between accuracy of prediction and risk, and find that th… ▽ More How do we design and deploy crowdsourced prediction platforms for real-world applications where risk is an important dimension of prediction performance? To answer this question, we conducted a large online Wisdom of the Crowd study where participants predicted the prices of real financial assets (e.g. S&P 500). We observe a Pareto frontier between accuracy of prediction and risk, and find that this trade-off is mediated by social learning i.e. as social learning is increasingly leveraged, it leads to lower accuracy but also lower risk. We also observe that social learning leads to superior accuracy during one of our rounds that occurred during the high market uncertainty of the Brexit vote. Our results have implications for the design of crowdsourced prediction platforms: for example, they suggest that the performance of the crowd should be more comprehensively characterized by using both accuracy and risk (as is standard in financial and statistical forecasting), in contrast to prior work where risk of prediction has been overlooked. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2006.09437 [pdf, other]

A Study of Compositional Generalization in Neural Models

Authors: Tim Klinger, Dhaval Adjodah, Vincent Marois, Josh Joseph, Matthew Riemer, Alex 'Sandy' Pentland, Murray Campbell

Abstract: Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of imag… ▽ More Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of images from compositional and relational concepts, defined using a logical domain specific language. We use it to generate images for a variety of compositional structures: 2x2 squares, pentominoes, sequences, scenes involving these objects, and other more complex concepts. We perform experiments to test the ability of standard neural architectures to generalize on relations with compositional arguments as the compositional depth of those arguments increases and under substitution. We compare standard neural networks such as MLP, CNN and ResNet, as well as state-of-the-art relational networks including WReN and PrediNet in a multi-class image classification setting. For simple problems, all models generalize well to close concepts but struggle with longer compositional chains. For more complex tests involving substitutivity, all models struggle, even with short chains. In highlighting these difficulties and providing an environment for further experimentation, we hope to encourage the development of models which are able to generalize effectively in compositional, relational domains. △ Less

Submitted 8 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: 28 pages

arXiv:2006.01028 [pdf, other]

Interpretable Stochastic Block Influence Model: measuring social influence among homophilous communities

Authors: Yan Leng, Tara Sowrirajan, Alex Pentland

Abstract: Decision-making on networks can be explained by both homophily and social influence. While homophily drives the formation of communities with similar characteristics, social influence occurs both within and between communities. Social influence can be reasoned through role theory, which indicates that the influences among individuals depend on their roles and the behavior of interest. To operation… ▽ More Decision-making on networks can be explained by both homophily and social influence. While homophily drives the formation of communities with similar characteristics, social influence occurs both within and between communities. Social influence can be reasoned through role theory, which indicates that the influences among individuals depend on their roles and the behavior of interest. To operationalize these social science theories, we empirically identify the homophilous communities and use the community structures to capture the "roles", which affect the particular decision-making processes. We propose a generative model named Stochastic Block Influence Model and jointly analyze both the network formation and the behavioral influence within and between different empirically-identified communities. To evaluate the performance and demonstrate the interpretability of our method, we study the adoption decisions of microfinance in an Indian village. We show that although individuals tend to form links within communities, there are strong positive and negative social influences between communities, supporting the weak tie theory. Moreover, we find that communities with shared characteristics are associated with positive influence. In contrast, the communities with a lack of overlap are associated with negative influence. Our framework facilitates the quantification of the influences underlying decision communities and is thus a useful tool for driving information diffusion, viral marketing, and technology adoptions. △ Less

Submitted 1 June, 2020; originally announced June 2020.

arXiv:2005.14689 [pdf, other]

Wallet Attestations for Virtual Asset Service Providers and Crypto-Assets Insurance

Authors: Thomas Hardjono, Alexander Lipton, Alex Pentland

Abstract: The emerging virtual asset service providers (VASP) industry currently faces a number of challenges related to the Travel Rule, notably pertaining to customer personal information, account number and cryptographic key information. VASPs will be handling virtual assets of different forms, where each may be bound to different private-public key pairs on the blockchain. As such, VASPs also face the a… ▽ More The emerging virtual asset service providers (VASP) industry currently faces a number of challenges related to the Travel Rule, notably pertaining to customer personal information, account number and cryptographic key information. VASPs will be handling virtual assets of different forms, where each may be bound to different private-public key pairs on the blockchain. As such, VASPs also face the additional problem of the management of its own keys and the management of customer keys that may reside in a customer wallet. The use of attestation technologies as applied to wallet systems may provide VASPs with suitable evidence relevant to the Travel Rule regarding cryptographic key information and their operational state. Additionally, wallet attestations may provide crypto-asset insurers with strong evidence regarding the key management aspects of a wallet device, thereby providing the insurance industry with measurable levels of assurance that can become the basis for insurers to perform risk assessment on crypto-assets bound to keys in wallets, both enterprise-grade wallets and consumer-grade wallets. △ Less

Submitted 29 May, 2020; originally announced May 2020.

Comments: 35 pages; 9 figures

arXiv:2005.12218 [pdf, other]

User behavior and token adoption on ERC20

Authors: Alfredo J. Morales, Shahar Somin, Yaniv Altshuler, Alex 'Sandy' Pentland

Abstract: Cryptocurrencies and Blockchain-based technologies are disrupting all markets. While the potential of such technologies remains to be seen, there is a current need to understand emergent patterns of user behavior and token adoption in order to design future products. In this paper we analyze the social dynamics taking place during one arbitrary day on the ERC20 platform. We characterize the networ… ▽ More Cryptocurrencies and Blockchain-based technologies are disrupting all markets. While the potential of such technologies remains to be seen, there is a current need to understand emergent patterns of user behavior and token adoption in order to design future products. In this paper we analyze the social dynamics taking place during one arbitrary day on the ERC20 platform. We characterize the network of token transactions among agents. We show heterogeneous profiles of user behavior, portfolio diversity, and token adoption. While most users are specialized in transacting with a few tokens, those that have diverse portfolios are bridging across large parts of the network and may jeopardize the system stability. We believe this work to be a foundation for unveiling the usage dynamics of crypto-currencies networks. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 12 pages, 4 figures

arXiv:2005.10414 [pdf, other]

Analysis of misinformation during the COVID-19 outbreak in China: cultural, social and political entanglements

Authors: Yan Leng, Yujia Zhai, Shao**g Sun, Yifei Wu, Jordan Selzer, Sharon Strover, Julia Fensel, Alex Pentland, Ying Ding

Abstract: COVID-19 resulted in an infodemic, which could erode public trust, impede virus containment, and outlive the pandemic itself. The evolving and fragmented media landscape is a key driver of the spread of misinformation. Using misinformation identified by the fact-checking platform by Tencent and posts on Weibo, our results showed that the evolution of misinformation follows an issue-attention cycle… ▽ More COVID-19 resulted in an infodemic, which could erode public trust, impede virus containment, and outlive the pandemic itself. The evolving and fragmented media landscape is a key driver of the spread of misinformation. Using misinformation identified by the fact-checking platform by Tencent and posts on Weibo, our results showed that the evolution of misinformation follows an issue-attention cycle, pertaining to topics such as city lockdown, cures, and preventions, and school reopening. Sources of authority weigh in on these topics, but their influence is complicated by peoples' pre-existing beliefs and cultural practices. Finally, social media has a complicated relationship with established or legacy media systems. Sometimes they reinforce each other, but in general, social media may have a topic cycle of its own making. Our findings shed light on the distinct characteristics of misinformation during the COVID-19 and offer insights into combating misinformation in China and across the world at large. △ Less

Submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.08201 [pdf, other]

ERC20 Transactions over Ethereum Blockchain: Network Analysis and Predictions

Authors: Shahar Somin, Goren Gordon, Alex Pentland, Erez Shmueli, Yaniv Altshuler

Abstract: Following the birth of Bitcoin and the introduction of the Ethereum ERC20 protocol a decade ago, recent years have witnessed a growing number of cryptographic tokens that are being introduced by researchers, private sector companies and NGOs. The ubiquitous of such Blockchain based cryptocurrencies give birth to a new kind of rising economy, which presents great difficulties to modeling its dynami… ▽ More Following the birth of Bitcoin and the introduction of the Ethereum ERC20 protocol a decade ago, recent years have witnessed a growing number of cryptographic tokens that are being introduced by researchers, private sector companies and NGOs. The ubiquitous of such Blockchain based cryptocurrencies give birth to a new kind of rising economy, which presents great difficulties to modeling its dynamics using conventional semantic properties. Our work presents the analysis of the dynamical properties of the ERC20 protocol compliant crypto-coins' trading data using a network theory prism. We examine the dynamics of ERC20 based networks over time by analyzing a meta-parameter of the network, the power of its degree distribution. Our analysis demonstrates that this parameter can be modeled as an under-damped harmonic oscillator over time, enabling a year forward of network parameters predictions. △ Less

Submitted 17 April, 2020; originally announced April 2020.

arXiv:2004.05222 [pdf]

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment

Authors: Mirco Nanni, Gennady Andrienko, Albert-László Barabási, Chiara Boldrini, Francesco Bonchi, Ciro Cattuto, Francesca Chiaromonte, Giovanni Comandé, Marco Conti, Mark Coté, Frank Dignum, Virginia Dignum, Josep Domingo-Ferrer, Paolo Ferragina, Fosca Giannotti, Riccardo Guidotti, Dirk Helbing, Kimmo Kaski, Janos Kertesz, Sune Lehmann, Bruno Lepri, Paul Lukowicz, Stan Matwin, David Megías Jiménez, Anna Monreale , et al. (14 additional authors not shown)

Abstract: The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countri… ▽ More The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the phase 2 of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively, voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates - if and when they want, for specific aims - with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society. △ Less

Submitted 16 April, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Revised text. Additional authors

Journal ref: Transactions on Data Privacy 13(1): 61-66 (2020), http://www.tdp.cat/issues16/abs.a389a20.php

arXiv:2003.14412 [pdf, other]

Assessing Disease Exposure Risk with Location Data: A Proposal for Cryptographic Preservation of Privacy

Authors: Alex Berke, Michiel Bakker, Praneeth Vepakomma, Kent Larson, Alex 'Sandy' Pentland

Abstract: Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of expos… ▽ More Governments and researchers around the world are implementing digital contact tracing solutions to stem the spread of infectious disease, namely COVID-19. Many of these solutions threaten individual rights and privacy. Our goal is to break past the false dichotomy of effective versus privacy-preserving contact tracing. We offer an alternative approach to assess and communicate users' risk of exposure to an infectious disease while preserving individual privacy. Our proposal uses recent GPS location histories, which are transformed and encrypted, and a private set intersection protocol to interface with a semi-trusted authority. There have been other recent proposals for privacy-preserving contact tracing, based on Bluetooth and decentralization, that could further eliminate the need for trust in authority. However, solutions with Bluetooth are currently limited to certain devices and contexts while decentralization adds complexity. The goal of this work is two-fold: we aim to propose a location-based system that is more privacy-preserving than what is currently being adopted by governments around the world, and that is also practical to implement with the immediacy needed to stem a viral outbreak. △ Less

Submitted 8 April, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

arXiv:2003.12347 [pdf]

Mobile phone data and COVID-19: Missing an opportunity?

Authors: Nuria Oliver, Emmanuel Letouzé, Harald Sterly, Sébastien Delataille, Marco De Nadai, Bruno Lepri, Renaud Lambiotte, Richard Benjamins, Ciro Cattuto, Vittoria Colizza, Nicolas de Cordes, Samuel P. Fraiberger, Till Koebe, Sune Lehmann, Juan Murillo, Alex Pentland, Phuong N Pham, Frédéric Pivetta, Albert Ali Salah, Jari Saramäki, Samuel V. Scarpino, Michele Tizzoni, Stefaan Verhulst, Patrick Vinck

Abstract: This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of… ▽ More This paper describes how mobile phone data can guide government and public health authorities in determining the best course of action to control the COVID-19 pandemic and in assessing the effectiveness of control measures such as physical distancing. It identifies key gaps and reasons why this kind of data is only scarcely used, although their value in similar epidemics has proven in a number of use cases. It presents ways to overcome these gaps and key recommendations for urgent action, most notably the establishment of mixed expert groups on national and regional level, and the inclusion and support of governments and public authorities early on. It is authored by a group of experienced data scientists, epidemiologists, demographers and representatives of mobile network operators who jointly put their work at the service of the global effort to combat the COVID-19 pandemic. △ Less

Submitted 27 March, 2020; originally announced March 2020.

Showing 1–50 of 123 results for author: Pentland, A