-
Generalized Relevance Learning Grassmann Quantization
Authors:
M. Mohammadi,
M. Babai,
M. H. F. Wilkinson
Abstract:
Due to advancements in digital cameras, it is easy to gather multiple images (or videos) from an object under different conditions. Therefore, image-set classification has attracted more attention, and different solutions were proposed to model them. A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold. In this contribution, we extend the application…
▽ More
Due to advancements in digital cameras, it is easy to gather multiple images (or videos) from an object under different conditions. Therefore, image-set classification has attracted more attention, and different solutions were proposed to model them. A popular way to model image sets is subspaces, which form a manifold called the Grassmann manifold. In this contribution, we extend the application of Generalized Relevance Learning Vector Quantization to deal with Grassmann manifold. The proposed model returns a set of prototype subspaces and a relevance vector. While prototypes model typical behaviours within classes, the relevance factors specify the most discriminative principal vectors (or images) for the classification task. They both provide insights into the model's decisions by highlighting influential images and pixels for predictions. Moreover, due to learning prototypes, the model complexity of the new method during inference is independent of dataset size, unlike previous works. We applied it to several recognition tasks including handwritten digit recognition, face recognition, activity recognition, and object recognition. Experiments demonstrate that it outperforms previous works with lower complexity and can successfully model the variation, such as handwritten style or lighting conditions. Moreover, the presence of relevances makes the model robust to the selection of subspaces' dimensionality.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Requirements Rationalization and Synthesis enabled by Model Synchronization
Authors:
Siyuan Ji,
Charles E. Dickerson,
Michael Wilkinson
Abstract:
In the international standard for system and software engineering ISO/IEC/IEEE 15288: 2015, the output of the stakeholder needs and the business or mission analysis technical processes are transformed into a technical view of the system by the system requirements definition process. In model-based systems engineering, functional needs can be modeled by use case diagrams. Intended outcomes of syste…
▽ More
In the international standard for system and software engineering ISO/IEC/IEEE 15288: 2015, the output of the stakeholder needs and the business or mission analysis technical processes are transformed into a technical view of the system by the system requirements definition process. In model-based systems engineering, functional needs can be modeled by use case diagrams. Intended outcomes of system requirements definition include resolution of disagreement about requirements, explicit agreement between stakeholders, and traceability. However, stakeholder needs are often elicited in a siloed manner and may be inconsistent. The lack of mathematically based systematic approaches for requirements definition poses a challenge to model-based transformation of needs into a technical view of the system that achieves agreement between stakeholders. This paper specifies and demonstrates mathematical frameworks for rationalizing and synthesizing functional needs that have been captured through an elicitation process. Benefits of this approach include but are not limited to supporting rigorous identification and resolution of disagreements and facilitating systematic analysis of change impact to achieve stakeholder agreement all with minimal intervention by the system engineers.
△ Less
Submitted 12 February, 2023;
originally announced February 2023.
-
Architecting Safer Autonomous Aviation Systems
Authors:
Jane Fenn,
Mark Nicholson,
Ganesh Pai,
Michael Wilkinson
Abstract:
The aviation literature gives relatively little guidance to practitioners about the specifics of architecting systems for safety, particularly the impact of architecture on allocating safety requirements, or the relative ease of system assurance resulting from system or subsystem level architectural choices. As an exemplar, this paper considers common architectural patterns used within traditional…
▽ More
The aviation literature gives relatively little guidance to practitioners about the specifics of architecting systems for safety, particularly the impact of architecture on allocating safety requirements, or the relative ease of system assurance resulting from system or subsystem level architectural choices. As an exemplar, this paper considers common architectural patterns used within traditional aviation systems and explores their safety and safety assurance implications when applied in the context of integrating artificial intelligence (AI) and machine learning (ML) based functionality. Considering safety as an architectural property, we discuss both the allocation of safety requirements and the architectural trade-offs involved early in the design lifecycle. This approach could be extended to other assured properties, similar to safety, such as security. We conclude with a discussion of the safety considerations that emerge in the context of candidate architectural patterns that have been proposed in the recent literature for enabling autonomy capabilities by integrating AI and ML. A recommendation is made for the generation of a property-driven architectural pattern catalogue.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
A comparative study of source-finding techniques in HI emission line cubes using SoFiA, MTObjects, and supervised deep learning
Authors:
J. A. Barkai,
M. A. W. Verheijen,
E. T. Martínez,
M. H. F. Wilkinson
Abstract:
The 21 cm spectral line emission of atomic neutral hydrogen (HI) is one of the primary wavelengths observed in radio astronomy. However, the signal is intrinsically faint and the HI content of galaxies depends on the cosmic environment, requiring large survey volumes and survey depth to investigate the HI Universe. As the amount of data coming from these surveys continues to increase with technolo…
▽ More
The 21 cm spectral line emission of atomic neutral hydrogen (HI) is one of the primary wavelengths observed in radio astronomy. However, the signal is intrinsically faint and the HI content of galaxies depends on the cosmic environment, requiring large survey volumes and survey depth to investigate the HI Universe. As the amount of data coming from these surveys continues to increase with technological improvements, so does the need for automatic techniques for identifying and characterising HI sources while considering the tradeoff between completeness and purity. This study aimed to find the optimal pipeline for finding and masking the most sources with the best mask quality and the fewest artefacts in 3D neutral hydrogen cubes. Various existing methods were explored in an attempt to create a pipeline to optimally identify and mask the sources in 3D neutral hydrogen 21 cm spectral line data cubes. Two traditional source-finding methods were tested, SoFiA and MTObjects, as well as a new supervised deep learning approach, in which a 3D convolutional neural network architecture, known as V-Net was used. These three source-finding methods were further improved by adding a classical machine learning classifier as a post-processing step to remove false positive detections. The pipelines were tested on HI data cubes from the Westerbork Synthesis Radio Telescope with additional inserted mock galaxies. SoFiA combined with a random forest classifier provided the best results, with the V-Net-random forest combination a close second. We suspect this is due to the fact that there are many more mock sources in the training set than real sources. There is, therefore, room to improve the quality of the V-Net network with better-labelled data such that it can potentially outperform SoFiA.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Structure Preserving Transformations for Practical Model-based Systems Engineering
Authors:
Siyuan Ji,
Michael Wilkinson,
Charles E. Dickerson
Abstract:
In this third decade of systems engineering in the twenty-first century, it is important to develop and demonstrate practical methods to exploit machine-readable models in the engineering of systems. Substantial investment has been made in languages and modelling tools for develo** models. A key problem is that system architects and engineers work in a multidisciplinary environment in which mode…
▽ More
In this third decade of systems engineering in the twenty-first century, it is important to develop and demonstrate practical methods to exploit machine-readable models in the engineering of systems. Substantial investment has been made in languages and modelling tools for develo** models. A key problem is that system architects and engineers work in a multidisciplinary environment in which models are not the product of any one individual. This paper provides preliminary results of a formal approach to specify models and structure preserving transformations between them that support model synchronization. This is an important area of research and practice in software engineering. However, it is limited to synchronization at the code level of systems. This paper leverages previous research of the authors to define a core fractal for interpretation of concepts into model specifications and transformation between models. This fractal is used to extend the concept of synchronization of models to the system level and is demonstrated through a practical engineering example for an advanced driver assistance system.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
RadNet: Incident Prediction in Spatio-Temporal Road Graph Networks Using Traffic Forecasting
Authors:
Shreshth Tuli,
Matthew R. Wilkinson,
Chris Kettell
Abstract:
Efficient and accurate incident prediction in spatio-temporal systems is critical to minimize service downtime and optimize performance. This work aims to utilize historic data to predict and diagnose incidents using spatio-temporal forecasting. We consider the specific use case of road traffic systems where incidents take the form of anomalous events, such as accidents or broken-down vehicles. To…
▽ More
Efficient and accurate incident prediction in spatio-temporal systems is critical to minimize service downtime and optimize performance. This work aims to utilize historic data to predict and diagnose incidents using spatio-temporal forecasting. We consider the specific use case of road traffic systems where incidents take the form of anomalous events, such as accidents or broken-down vehicles. To tackle this, we develop a neural model, called RadNet, which forecasts system parameters such as average vehicle speeds for a future timestep. As such systems largely follow daily or weekly periodicity, we compare RadNet's predictions against historical averages to label incidents. Unlike prior work, RadNet infers spatial and temporal trends in both permutations, finally combining the dense representations before forecasting. This facilitates informed inference and more accurate incident detection. Experiments with two publicly available and a new road traffic dataset demonstrate that the proposed model gives up to 8% higher prediction F1 scores compared to the state-of-the-art methods.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Metaparametric Neural Networks for Survival Analysis
Authors:
Fabio Luis de Mello,
J Mark Wilkinson,
Visakan Kadirkamanathan
Abstract:
Survival analysis is a critical tool for the modelling of time-to-event data, such as life expectancy after a cancer diagnosis or optimal maintenance scheduling for complex machinery. However, current neural network models provide an imperfect solution for survival analysis as they either restrict the shape of the target probability distribution or restrict the estimation to pre-determined times.…
▽ More
Survival analysis is a critical tool for the modelling of time-to-event data, such as life expectancy after a cancer diagnosis or optimal maintenance scheduling for complex machinery. However, current neural network models provide an imperfect solution for survival analysis as they either restrict the shape of the target probability distribution or restrict the estimation to pre-determined times. As a consequence, current survival neural networks lack the ability to estimate a generic function without prior knowledge of its structure. In this article, we present the metaparametric neural network framework that encompasses existing survival analysis methods and enables their extension to solve the aforementioned issues. This framework allows survival neural networks to satisfy the same independence of generic function estimation from the underlying data structure that characterizes their regression and classification counterparts. Further, we demonstrate the application of the metaparametric framework using both simulated and large real-world datasets and show that it outperforms the current state-of-the-art methods in (i) capturing nonlinearities, and (ii) identifying temporal patterns, leading to more accurate overall estimations whilst placing no restrictions on the underlying function structure.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
A Fresh Look at FAIR for Research Software
Authors:
Daniel S. Katz,
Morane Gruenpeter,
Tom Honeyman,
Lorraine Hwang,
Mark D. Wilkinson,
Vanessa Sochat,
Hartwig Anzt,
Carole Goble,
for FAIR4RS Subgroup 1
Abstract:
This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the charac…
▽ More
This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the characteristic and practiced differences of research software and data. This vision and understanding of initial conditions serves as a backdrop for an attempt at translating and interpreting the guiding principles to more fully align with research software. We have found that many of the principles remained relatively intact as written, as long as considerable interpretation was provided. This was particularly the case for the "Findable" and "Accessible" foundational principles. We found that "Interoperability" and "Reusability" are particularly prone to a broad and sometimes opposing set of interpretations as written. We propose two new principles modeled on existing ones, and provide modified guiding text for these principles to help clarify our final interpretation. A series of gaps in translation were captured during this process, and these remain to be addressed. We finish with a consideration of where these translated principles fall short of the vision laid out in the opening.
△ Less
Submitted 9 February, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Architecture Definition in Complex System Design Using Model Theory
Authors:
Charles E. Dickerson,
Michael K. Wilkinson,
Eugenie Hunsicker,
Siyuan Ji,
Mole Li,
Yves Bernard,
Graham Bleakley,
Peter Denno
Abstract:
Architecture Definition, which is central to system design, is one of the two most used technical processes in the practice of model-based systems engineering. In this paper a fundamental approach to architecture definition is presented and demonstrated. The success of its application to engineering problems depends on a precise but practical definition of the term architecture. In the standard fo…
▽ More
Architecture Definition, which is central to system design, is one of the two most used technical processes in the practice of model-based systems engineering. In this paper a fundamental approach to architecture definition is presented and demonstrated. The success of its application to engineering problems depends on a precise but practical definition of the term architecture. In the standard for Architecture Description, ISO/IEC/IEEE 42010:2011, a definition was adopted that has been subsumed into later standards. In 2018 the working group JTC1/SC7/WG42 on System Architecture began a review of the adopted definition, holding sessions late in the year. This paper extends and complements a position paper submitted during the meetings; in which Tarski model theory and ISO/IEC 24707:2018 (logic-based languages) were used to better understand relationships between system models and concepts related to architecture. Independent from the working group, it now contributes intuitive fundamental definitions of the terms architecture and system that are used to specify a mathematically based technical process for architecture definition. The engineering utility and benefits to complex system design are demonstrated in a diesel engine emissions reduction case study.
△ Less
Submitted 23 March, 2020; v1 submitted 15 September, 2019;
originally announced September 2019.
-
Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq
Authors:
Z. Trstanova,
A. Martinsson,
C. Matthews,
S. Jimenez,
B. Leimkuhler,
T. Van Delft,
M. Wilkinson
Abstract:
A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the t…
▽ More
A detailed understanding of wind turbine performance status classification can improve operations and maintenance in the wind energy industry. Due to different engineering properties of wind turbines, the standard supervised learning models used for classification do not generalize across data sets obtained from different wind sites. We propose two methods to deal with the transferability of the trained models: first, data normalization in the form of power curve alignment, and second, a robust method based on convolutional neural networks and feature-space extension. We demonstrate the success of our methods on real-world data sets with industrial applications.
△ Less
Submitted 21 March, 2019;
originally announced March 2019.
-
The FAIR Funder pilot programme to make it easy for funders to require and for grantees to produce FAIR Data
Authors:
P. Wittenburg,
H. Pergl Sustkova,
A. Montesanti,
S. M. Bloemers,
S. H. de Waard,
M. A. Musen,
J. B. Graybeal,
K. M. Hettne,
A. Jacobsen,
R. Pergl,
R. W. W. Hooft,
C. Staiger,
C. W. G. van Gelder,
S. L. Knijnenburg,
A. C. van Arkel,
B. Meerman,
M. D. Wilkinson,
S-A Sansone,
P. Rocca-Serra,
P. McQuilton,
A. N. Gonzalez-Beltran,
G. J. C. Aben,
P. Henning,
S. Alencar,
C. Ribeiro
, et al. (35 additional authors not shown)
Abstract:
There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for…
▽ More
There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of workshops to encourage the creation of Metadata for Machines (M4M), enabling any self-identified stakeholder to define and promote the reuse of standardized, comprehensive machine-actionable metadata. The funders of scientific research recognize that they have an important role to play in ensuring that experimental results are FAIR, and that high quality metadata and careful planning for FAIR data stewardship are central to these goals. We describe the outcome of a recent M4M workshop that has led to a pilot programme involving two national science funders, the Health Research Board of Ireland (HRB) and the Netherlands Organisation for Health Research and Development (ZonMW). These funding organizations will explore new technologies to define at the time that a request for proposals is issued the minimal set of machine-actionable metadata that they would like investigators to use to annotate their datasets, to enable investigators to create such metadata to help make their data FAIR, and to develop data-stewardship plans that ensure that experimental data will be managed appropriately abiding by the FAIR principles. The FAIR Funders design envisions a data-management workflow having seven essential stages, where solution providers are openly invited to participate. The initial pilot programme will launch using existing computer-based tools of those who attended the M4M Workshop.
△ Less
Submitted 6 March, 2019; v1 submitted 26 February, 2019;
originally announced February 2019.
-
Concepts of Architecture, Structure and System
Authors:
Michael K. Wilkinson,
Charles E. Dickerson,
Siyuan Ji
Abstract:
The current ISO standards pertaining to the Concepts of System and Architecture express succinct definitions of these two key terms that lend themselves to practical application and can be understood through elementary mathematical foundations. The current work of the ISO/IEC Working Group 42 is seeking to refine and elaborate the existing standards. This position paper revisits the fundamental co…
▽ More
The current ISO standards pertaining to the Concepts of System and Architecture express succinct definitions of these two key terms that lend themselves to practical application and can be understood through elementary mathematical foundations. The current work of the ISO/IEC Working Group 42 is seeking to refine and elaborate the existing standards. This position paper revisits the fundamental concepts underlying both of these key terms and offers an approach to: (i) refine and exemplify the term 'fundamental concepts' in the current ISO definition of Architecture, (ii) exploit existing standards for the term 'concept', and (iii) introduce a new concept, Architectural Structure, that can serve to unify the current terminology at a fundamental level. Precise elementary examples are used in to conceptualise the approach offered.
△ Less
Submitted 5 December, 2018; v1 submitted 29 October, 2018;
originally announced October 2018.
-
OntoLoki: an automatic, instance-based method for the evaluation of biological ontologies on the Semantic Web
Authors:
Benjamin M. Good,
Gavin Ha,
Chi K. Ho,
Mark D. Wilkinson
Abstract:
The delineation of logical definitions for each class in an ontology and the consistent application of these definitions to the assignment of instances to classes are important criteria for ontology evaluation. If ontologies are specified with property-based restrictions on class membership, then such consistency can be checked automatically. If no such logical restrictions are applied, as is the…
▽ More
The delineation of logical definitions for each class in an ontology and the consistent application of these definitions to the assignment of instances to classes are important criteria for ontology evaluation. If ontologies are specified with property-based restrictions on class membership, then such consistency can be checked automatically. If no such logical restrictions are applied, as is the case with many biological ontologies, there are currently no automated methods for measuring the semantic consistency of instance assignment on an ontology-wide scale, nor for inferring the patterns of properties that might define a particular class. We constructed a program that takes as its input an OWL/RDF knowledge base containing an ontology, instances associated with each of the classes in the ontology, and properties of those instances. For each class, it outputs: 1) a rule for determining class membership based on the properties of the instances and 2) a quantitative score for the class that reflects the ability of the identified rule to correctly predict class membership for the instances in the knowledge base. We evaluated this program using both artificial knowledge bases of known quality and real, widely used ontologies. The results indicate that the suggested method can be used to conduct objective, automatic, data-driven evaluations of biological ontologies without formal class definitions in regards to the property-based consistency of instance-assignment. This inductive method complements existing, purely deductive approaches to automatic consistency checking, offering not just the potential to help in the ontology engineering process but also in the knowledge discovery process.
△ Less
Submitted 20 February, 2015;
originally announced February 2015.
-
Automatic annotation of bioinformatics workflows with biomedical ontologies
Authors:
Beatriz García-Jiménez,
Mark D. Wilkinson
Abstract:
Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way…
▽ More
Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way. Despite a dearth of even textual descriptions, we automatically annotated 530 myExperiment bioinformatics-related workflows, including more than 2600 workflow-associated services, with relevant ontological terms. Quantitative evaluation of the Information Content of these terms suggests that, in cases where annotation was possible at all, the annotation quality was comparable to manually curated bioinformatics resources.
△ Less
Submitted 1 July, 2014;
originally announced July 2014.
-
SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web
Authors:
Ben P Vandervalk,
E Luke McCarthy,
Mark D Wilkinson
Abstract:
Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries ac…
▽ More
Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries across distributed datasets, the system allows for a target dataset that is significantly larger than is possible under current, centralized approaches. Although the architecture is equally applicable to all types of data, the SHARE system targets bioinformatics, due to the large number of interoperable web services that are already available in this area. SHARE is built entirely on semantic web standards, and is the successor of the BioMOBY project.
△ Less
Submitted 20 May, 2013;
originally announced May 2013.
-
SPARQL Assist Language-Neutral Query Composer
Authors:
Luke McCarthy,
Ben Vandervalk,
Mark Wilkinson
Abstract:
SPARQL query composition is difficult for the lay-person or even the experienced bioinformatician in cases where the data model is unfamiliar. Established best-practices and internationalization concerns dictate that semantic web ontologies should use terms with opaque identifiers, further complicating the task. We present SPARQL Assist: a web application that addresses these issues by providing c…
▽ More
SPARQL query composition is difficult for the lay-person or even the experienced bioinformatician in cases where the data model is unfamiliar. Established best-practices and internationalization concerns dictate that semantic web ontologies should use terms with opaque identifiers, further complicating the task. We present SPARQL Assist: a web application that addresses these issues by providing context-sensitive type-ahead completion to existing web forms. Ontological terms are suggested using their labels and descriptions, leveraging existing XML support for internationalization and language-neutrality.
△ Less
Submitted 7 December, 2010;
originally announced December 2010.
-
Feedback loops of attention in peer production
Authors:
Fang Wu,
Dennis M. Wilkinson,
Bernardo A. Huberman
Abstract:
A significant percentage of online content is now published and consumed via the mechanism of crowdsourcing. While any user can contribute to these forums, a disproportionately large percentage of the content is submitted by very active and devoted users, whose continuing participation is key to the sites' success. As we show, people's propensity to keep participating increases the more they con…
▽ More
A significant percentage of online content is now published and consumed via the mechanism of crowdsourcing. While any user can contribute to these forums, a disproportionately large percentage of the content is submitted by very active and devoted users, whose continuing participation is key to the sites' success. As we show, people's propensity to keep participating increases the more they contribute, suggesting motivating factors which increase over time. This paper demonstrates that submitters who stop receiving attention tend to stop contributing, while prolific contributors attract an ever increasing number of followers and their attention in a feedback loop. We demonstrate that this mechanism leads to the observed power law in the number of contributions per user and support our assertions by an analysis of hundreds of millions of contributions to top content sharing websites Digg.com and Youtube.com.
△ Less
Submitted 11 May, 2009;
originally announced May 2009.
-
Assessing the Value of Coooperation in Wikipedia
Authors:
Dennis M. Wilkinson,
Bernardo A. Huberman
Abstract:
Since its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certai…
▽ More
Since its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certain overall regularities. We show that the accretion of edits to an article is described by a simple stochastic mechanism, resulting in a heavy tail of highly visible articles with a large number of edits. We also demonstrate a crucial correlation between article quality and number of edits, which validates Wikipedia as a successful collaborative effort.
△ Less
Submitted 23 February, 2007;
originally announced February 2007.
-
Rhythms of social interaction: messaging within a massive online network
Authors:
Scott Golder,
Dennis M. Wilkinson,
Bernardo A. Huberman
Abstract:
We have analyzed the fully-anonymized headers of 362 million messages exchanged by 4.2 million users of Facebook, an online social network of college students, during a 26 month interval. The data reveal a number of strong daily and weekly regularities which provide insights into the time use of college students and their social lives, including seasonal variations. We also examined how factors…
▽ More
We have analyzed the fully-anonymized headers of 362 million messages exchanged by 4.2 million users of Facebook, an online social network of college students, during a 26 month interval. The data reveal a number of strong daily and weekly regularities which provide insights into the time use of college students and their social lives, including seasonal variations. We also examined how factors such as school affiliation and informal online friend lists affect the observed behavior and temporal patterns. Finally, we show that Facebook users appear to be clustered by school with respect to their temporal messaging patterns.
△ Less
Submitted 27 November, 2006;
originally announced November 2006.