-
Subsurface hydrogen storage controlled by small-scale rock heterogeneities
Authors:
Zaid Jangda,
Hannah Menke,
Andreas Busch,
Sebastian Geiger,
Tom Bultreys,
Kamaljit Singh
Abstract:
Subsurface porous rocks have the potential to store large volumes of hydrogen (H$_2$) required for transitioning towards a H$_2$-based energy future. Understanding the flow and trap** behavior of H$_2$ in subsurface storage systems, which is influenced by pore-scale heterogeneities inherent to subsurface rocks, is crucial to reliably evaluate the storage efficiency of a geological formation. In…
▽ More
Subsurface porous rocks have the potential to store large volumes of hydrogen (H$_2$) required for transitioning towards a H$_2$-based energy future. Understanding the flow and trap** behavior of H$_2$ in subsurface storage systems, which is influenced by pore-scale heterogeneities inherent to subsurface rocks, is crucial to reliably evaluate the storage efficiency of a geological formation. In this work, we performed 3D X-ray imaging and flow experiments to investigate the impact of pore-scale heterogeneity on H$_2$ distribution after its cyclic injection (drainage) and withdrawal (imbibition) from a layered rock sample, characterized by varying pore and throat sizes. Our findings reveal that even subtle variations in rock structure and properties significantly influence H$_2$ displacement and storage efficiency. During drainage, H$_2$ follows a path consisting of large pores and throats, bypassing the majority of the low permeability rock layer consisting of smaller pores and throats. This bypassing substantially reduces the H$_2$ storage capacity. Moreover, due to the varying pore and throat sizes in the layered sample, depending on the experimental flow strategy, we observe a higher H$_2$ saturation after imbibition compared to drainage, which is counterintuitive and opposite to that observed in homogeneous rocks. These findings emphasize that small-scale rock heterogeneity, which is often unaccounted for in reservoir-scale models, can play a vital role in the displacement and trap** of H$_2$ in subsurface porous media.
△ Less
Submitted 13 October, 2023; v1 submitted 8 October, 2023;
originally announced October 2023.
-
Non-volatile Phase-only Transmissive Spatial Light Modulators
Authors:
Zhuoran Fang,
Rui Chen,
Johannes E. Fröch,
Quentin A. A. Tanguy,
Asir Intisar Khan,
Xiang** Wu,
Virat Tara,
Arnab Manna,
David Sharp,
Christopher Munley,
Forrest Miller,
Yang Zhao,
Sarah J. Geiger,
Karl F. Böhringer,
Matthew Reynolds,
Eric Pop,
Arka Majumdar
Abstract:
Free-space modulation of light is crucial for many applications, from light detection and ranging to virtual or augmented reality. Traditional means of modulating free-space light involves spatial light modulators based on liquid crystals and microelectromechanical systems, which are bulky, have large pixel areas (~10 micron x 10 micron), and require high driving voltage. Recent progress in meta-o…
▽ More
Free-space modulation of light is crucial for many applications, from light detection and ranging to virtual or augmented reality. Traditional means of modulating free-space light involves spatial light modulators based on liquid crystals and microelectromechanical systems, which are bulky, have large pixel areas (~10 micron x 10 micron), and require high driving voltage. Recent progress in meta-optics has shown promise to circumvent some of the limitations. By integrating active materials with sub-wavelength pixels in a meta-optic, the power consumption can be dramatically reduced while achieving a faster speed. However, these reconfiguration methods are volatile and hence require constant application of control signals, leading to phase jitter and crosstalk. Additionally, to control a large number of pixels, it is essential to implement a memory within each pixel to have a tractable number of control signals. Here, we develop a device with nonvolatile, electrically programmable, phase-only modulation of free-space infrared radiation in transmission using the low-loss phase-change material (PCM) Sb2Se3. By coupling an ultra-thin PCM layer to a high quality (Q)-factor (Q~406) diatomic metasurface, we demonstrate a phase-only modulation of ~0.25pi (~0.2pi) in simulation (experiment), ten times larger than a bare PCM layer of the same thickness. The device shows excellent endurance over 1,000 switching cycles. We then advance the device geometry, to enable independent control of 17 meta-molecules, achieving ten deterministic resonance levels with a 2pi phase shift. By independently controlling the phase delay of pixels, we further show tunable far-field beam sha**. Our work paves the way to realizing non-volatile transmissive phase-only spatial light modulators.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
An Open-Source Multi-functional Testing Platform for Optical Phase Change Materials
Authors:
Cosmin-Constantin Popescu,
Khoi Phuong Dao,
Luigi Ranno,
Brian Mills,
Louis Martin,
Yifei Zhang,
David Bono. Brian Neltner,
Tian Gu,
Juejun Hu,
Kiumars Aryana,
William M. Humphreys,
Hyun Jung Kim,
Steven Vitale,
Paul Miller,
Christopher Roberts,
Sarah Geiger,
Dennis Callahan,
Michael Moebius,
Myungkoo Kang,
Kathleen Richardson,
Carlos A. Ríos Ocampo
Abstract:
Owing to their unique tunable optical properties, chalcogenide phase change materials are increasingly being investigated for optics and photonics applications. However, in situ characterization of their phase transition characteristics is a capability that remains inaccessible to many researchers. In this article, we introduce a multi-functional silicon microheater platform capable of in situ mea…
▽ More
Owing to their unique tunable optical properties, chalcogenide phase change materials are increasingly being investigated for optics and photonics applications. However, in situ characterization of their phase transition characteristics is a capability that remains inaccessible to many researchers. In this article, we introduce a multi-functional silicon microheater platform capable of in situ measurement of structural, kinetic, optical, and thermal properties of these materials. The platform can be fabricated leveraging industry-standard silicon foundry manufacturing processes. We fully open-sourced this platform, including complete hardware design and associated software codes.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
The FluidFlower International Benchmark Study: Process, Modeling Results, and Comparison to Experimental Data
Authors:
Bernd Flemisch,
Jan M. Nordbotten,
Martin Fernø,
Ruben Juanes,
Holger Class,
Mojdeh Delshad,
Florian Doster,
Jonathan Ennis-King,
Jacques Franc,
Sebastian Geiger,
Dennis Gläser,
Christopher Green,
James Gunning,
Hadi Hajibeygi,
Samuel J. Jackson,
Mohamad Jammoul,
Satish Karra,
Jiawei Li,
Stephan K. Matthäi,
Terry Miller,
Qi Shao,
Catherine Spurin,
Philip Stauffer,
Hamdi Tchelepi,
Xiaoming Tian
, et al. (8 additional authors not shown)
Abstract:
Successful deployment of geological carbon storage (GCS) requires an extensive use of reservoir simulators for screening, ranking and optimization of storage sites. However, the time scales of GCS are such that no sufficient long-term data is available yet to validate the simulators against. As a consequence, there is currently no solid basis for assessing the quality with which the dynamics of la…
▽ More
Successful deployment of geological carbon storage (GCS) requires an extensive use of reservoir simulators for screening, ranking and optimization of storage sites. However, the time scales of GCS are such that no sufficient long-term data is available yet to validate the simulators against. As a consequence, there is currently no solid basis for assessing the quality with which the dynamics of large-scale GCS operations can be forecasted.
To meet this knowledge gap, we have conducted a major GCS validation benchmark study. To achieve reasonable time scales, a laboratory-size geological storage formation was constructed (the "FluidFlower"), forming the basis for both the experimental and computational work. A validation experiment consisting of repeated GCS operations was conducted in the FluidFlower, providing what we define as the true physical dynamics for this system. Nine different research groups from around the world provided forecasts, both individually and collaboratively, based on a detailed physical and petrophysical characterization of the FluidFlower sands.
The major contribution of this paper is a report and discussion of the results of the validation benchmark study, complemented by a description of the benchmarking process and the participating computational models. The forecasts from the participating groups are compared to each other and to the experimental data by means of various indicative qualitative and quantitative measures. By this, we provide a detailed assessment of the capabilities of reservoir simulators and their users to capture both the injection and post-injection dynamics of the GCS operations.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Rewritable Photonic Integrated Circuits Using Dielectric-assisted Phase-change Material Waveguides
Authors:
Forrest Miller,
Rui Chen,
Johannes E. Froech,
Hannah Rarick,
Sarah Geiger,
Arka Majumdar
Abstract:
Photonic integrated circuits (PICs) have the potential to drastically expand the capabilities of optical communications, sensing, and quantum information science and engineering. However, PICs are commonly fabricated using selective material etching, a subtractive process. Thus, the chip's functionality cannot be substantially altered once fabricated. Here, we propose to exploit wide-bandgap non-v…
▽ More
Photonic integrated circuits (PICs) have the potential to drastically expand the capabilities of optical communications, sensing, and quantum information science and engineering. However, PICs are commonly fabricated using selective material etching, a subtractive process. Thus, the chip's functionality cannot be substantially altered once fabricated. Here, we propose to exploit wide-bandgap non-volatile phase-change materials (PCMs) to create a rewritable PIC platform. A PCM-based PIC can be written using a nano-second pulsed laser without removing any material, akin to rewritable compact disks. The whole circuit can then be erased by heating, and a completely new circuit can be rewritten. We designed a dielectric-assisted PCM waveguide consisting of a thick dielectric layer on top of a thin layer of wide-bandgap PCMs Sb2S3 and Sb2Se3. The low-loss PCMs and our engineered waveguiding structure lead to a negligible optical loss. Furthermore, we analyzed and specified the spatio-temporal laser pulse shape to write the PCMs. Our proposed platform will enable low-cost manufacturing and have a far-reaching impact on the rapid prototy** of PICs, validation of new designs, and photonic education.
△ Less
Submitted 21 April, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
Non-volatile electrically programmable integrated photonics with a 5-bit operation
Authors:
Rui Chen,
Zhuoran Fang,
Christopher Perez,
Forrest Miller,
Khushboo Kumari,
Abhi Saxena,
Jiajiu Zheng,
Sarah J. Geiger,
Kenneth E. Goodson,
Arka Majumdar
Abstract:
Scalable programmable photonic integrated circuits (PICs) can potentially transform the current state of classical and quantum optical information processing. However, traditional means of programming, including thermo-optic, free carrier dispersion, and Pockels effect result in either large device footprints or high static energy consumptions, significantly limiting their scalability. While chalc…
▽ More
Scalable programmable photonic integrated circuits (PICs) can potentially transform the current state of classical and quantum optical information processing. However, traditional means of programming, including thermo-optic, free carrier dispersion, and Pockels effect result in either large device footprints or high static energy consumptions, significantly limiting their scalability. While chalcogenide-based non-volatile phase-change materials (PCMs) could mitigate these problems thanks to their strong index modulation and zero static power consumption, they often suffer from large absorptive loss, low cyclability, and lack of multilevel operation. Here, we report a wide-bandgap PCM antimony sulfide (Sb2S3)-clad silicon photonic platform simultaneously achieving low loss, high cyclability, and 5-bit operation. We switch Sb2S3 via an on-chip silicon PIN diode heater and demonstrate components with low insertion loss (<1.0 dB), high extinction ratio (>10 dB), and high endurance (>1,600 switching events). Remarkably, we find that Sb2S3 can be programmed into fine intermediate states by applying identical and thermally isolated pulses, providing a unique approach to controllable multilevel operation. Through dynamic pulse control, we achieve on-demand and accurate 5-bit (32 levels) operations, rendering 0.50 +- 0.16 dB contrast per step. Using this multilevel behavior, we further trim random phase error in a balanced Mach-Zehnder interferometer. Our work opens an attractive pathway toward non-volatile large-scale programmable PICs with low-loss and on-demand multi-bit operations.
△ Less
Submitted 1 January, 2023;
originally announced January 2023.
-
Channeling: a new class of dissolution in complex porous media
Authors:
Hannah P. Menke,
Julien Maes,
Sebastian Geiger
Abstract:
The current conceptual model of mineral dissolution in porous media is comprised of three dissolution patterns (wormhole, compact, and uniform) - or regimes - that develop depending on the relative dominance of flow, diffusion, and reaction rate. Here, we examine the evolution of pore structure during acid injection using numerical simulations on two porous media structures of increasing complexit…
▽ More
The current conceptual model of mineral dissolution in porous media is comprised of three dissolution patterns (wormhole, compact, and uniform) - or regimes - that develop depending on the relative dominance of flow, diffusion, and reaction rate. Here, we examine the evolution of pore structure during acid injection using numerical simulations on two porous media structures of increasing complexity. We examine the boundaries between regimes and characterise the existence of a fourth regime called channeling, where already existing fast flow pathways are preferentially widened by dissolution. Channeling occurs in cases where the distribution in pore throat size results in orders of magnitude differences in flow rate for different flow pathways. This focusing of dissolution along only dominant flow paths induces an immediate, large change in permeability with a comparatively small change in porosity, resulting in a porosity-permeability relationship unlike any that has been previously seen. This work demonstrates that our current conceptual model of dissolution regimes must be modified to include channeling for accurate predictions of dissolution in applications such as geologic carbon storage and geothermal energy production.
△ Less
Submitted 17 March, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Pore-Scale Visualization of Hydrogen Storage in a Sandstone at Subsurface Pressure and Temperature Conditions: Trap**, Dissolution and Wettability
Authors:
Zaid Jangda,
Hannah Menke,
Andreas Busch,
Sebastian Geiger,
Tom Bultreys,
Helen Lewis,
Kamaljit Singh
Abstract:
The global commitment to achieve net-zero has led to increasing investment towards the production and usage of green hydrogen (H2).However, the massive quantity needed to match future demand will require new storage facilities. Underground storage of H2 is a potentially viable solution, but poses unique challenges due to the distinctive physical and chemical properties of H2, that have yet to be s…
▽ More
The global commitment to achieve net-zero has led to increasing investment towards the production and usage of green hydrogen (H2).However, the massive quantity needed to match future demand will require new storage facilities. Underground storage of H2 is a potentially viable solution, but poses unique challenges due to the distinctive physical and chemical properties of H2, that have yet to be studied quantitatively in the subsurface environment. We have performed in situ X-ray flow experiments to investigate the fundamentals of pore-scale fluid displacement processes during H2 injection into an initially brine saturated Bentheimer sandstone sample. Two different injection schemes were followed, the displacement of H2 with H2-equilibrated brine and non-H2-equilibrated brine both at temperature and pressure conditions representative of deep underground reservoirs. H2 was found to be non-wetting to brine after both displacement cycles, with average contact angles between 53.72 and 52.72, respectively. We also found a higher recovery of H2 (43.1%) for non-H2-equilibrated brine compared to that of H2-equilibrated brine (31.6%), indicating potential dissolution of H2 in unequilibrated brine at reservoir conditions. Our results suggest that H2 storage may indeed be a suitable strategy for energy storage, but considerable further research is needed to fully comprehend the pore-scale interactions at reservoir conditions.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
"Garbage In, Garbage Out" Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?
Authors:
R. Stuart Geiger,
Dominique Cope,
Jamie Ip,
Marsha Lotosh,
Aayush Shah,
Jenny Weng,
Rebekah Tang
Abstract:
Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent 'best practices' around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that…
▽ More
Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent 'best practices' around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that apply supervised ML in a far broader spectrum of disciplines, focusing on human-labeled data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed, while acknowledging that a greater range of application fields necessarily produces greater diversity of labeling and annotation methods. Because much of machine learning research and education only focuses on what is done once a "ground truth" or "gold standard" of training data is available, it is especially relevant to discuss issues around the equally-important aspect of whether such data is reliable in the first place. This determination becomes increasingly complex when applied to a variety of specialized fields, as labeling can range from a task requiring little-to-no background knowledge to one that must be performed by someone with career expertise.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Upscaling the porosity-permeability relationship of a microporous carbonate to the Darcy scale with machine learning
Authors:
Hannah P. Menke,
Julien Maes,
Sebastian Geiger
Abstract:
The permeability of a pore structure is typically described by stochastic representations of its geometrical attributes. Database-driven numerical solvers for large model domains can only accurately predict large-scale flow behaviour when they incorporate upscaled descriptions of that structure. The upscaling is particularly challenging for rocks with multimodal porosity structures such as carbona…
▽ More
The permeability of a pore structure is typically described by stochastic representations of its geometrical attributes. Database-driven numerical solvers for large model domains can only accurately predict large-scale flow behaviour when they incorporate upscaled descriptions of that structure. The upscaling is particularly challenging for rocks with multimodal porosity structures such as carbonates, where several different types of structures are interacting. It is the connectivity both within and between these different structures that controls the porosity-permeability relationship at the larger length scales. Recent advances in machine learning combined with numerical modelling and structural analysis have allowed us to probe the relationship between structure and permeability more deeply. We have used this integrated approach to tackle the challenge of upscaling multimodal and multiscale porous media. We present a novel method for upscaling multimodal porosity-permeability relationships using machine learning based multivariate structural regression. A m-CT image of limestone was divided into sub-volumes and permeability was computed using the DBS model. The porosity-permeability relationship from Menke et al. was used to assign permeability values to the microporosity. Structural attributes of each sub-volume were extracted and then regressed against the solved permeability using an Extra-Trees regression model to derive an upscaled porosity-permeability relationship. Ten upscaled test cases were then modelled at the Darcy scale using the regression and benchmarked against full DBS simulations, a numerically upscaled Darcy model, and a K-C fit. We found good agreement between the full DBS simulations and both the numerical and machine learning upscaled models while the K-C model was a poor predictor in all cases.
△ Less
Submitted 23 September, 2020;
originally announced October 2020.
-
Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?
Authors:
R. Stuart Geiger,
Kevin Yu,
Yanlai Yang,
Mindy Dai,
Jie Qiu,
Rebekah Tang,
Jenny Huang
Abstract:
Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper's authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this pap…
▽ More
Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper's authors labeling the data themselves. Such a task is quite similar to (or a form of) structured content analysis, which is a longstanding methodology in the social sciences and humanities, with many established best practices. In this paper, we investigate to what extent a sample of machine learning application papers in social computing --- specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data --- give specific details about whether such best practices were followed. Our team conducted multiple rounds of structured content analysis of each paper, making determinations such as: Does the paper report who the labelers were, what their qualifications were, whether they independently labeled the same items, whether inter-rater reliability metrics were disclosed, what level of training and/or instructions were given to labelers, whether compensation for crowdworkers is disclosed, and if the training data is publicly available. We find a wide divergence in whether such practices were followed and documented. Much of machine learning research and education focuses on what is done once a "gold standard" of training data is available, but we discuss issues around the equally-important aspect of whether such data is reliable in the first place.
△ Less
Submitted 17 December, 2019;
originally announced December 2019.
-
ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia
Authors:
Aaron Halfaker,
R. Stuart Geiger
Abstract:
Algorithmic systems---from rule-based bots to machine learning classifiers---have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter-vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality an…
▽ More
Algorithmic systems---from rule-based bots to machine learning classifiers---have a long history of supporting the essential work of content moderation and other curation work in peer production projects. From counter-vandalism to task routing, basic machine prediction has allowed open knowledge projects like Wikipedia to scale to the largest encyclopedia in the world, while maintaining quality and consistency. However, conversations about how quality control should work and what role algorithms should play have generally been led by the expert engineers who have the skills and resources to develop and modify these complex algorithmic systems. In this paper, we describe ORES: an algorithmic scoring service that supports real-time scoring of wiki edits using multiple independent classifiers trained on different datasets. ORES decouples several activities that have typically all been performed by engineers: choosing or curating training data, building models to serve predictions, auditing predictions, and develo** interfaces or automated agents that act on those predictions. This meta-algorithmic system was designed to open up socio-technical conversations about algorithms in Wikipedia to a broader set of participants. In this paper, we discuss the theoretical mechanisms of social change ORES enables and detail case studies in participatory machine learning around ORES from the 5 years since its deployment.
△ Less
Submitted 20 August, 2020; v1 submitted 11 September, 2019;
originally announced September 2019.
-
The Rise and Fall of the Note: Changing Paper Lengths in ACM CSCW, 2000-2018
Authors:
R. Stuart Geiger
Abstract:
In this note, I quantitatively examine various trends in the lengths of published papers in ACM CSCW from 2000-2018, focusing on several major transitions in editorial and reviewing policy. The focus is on the rise and fall of the 4-page note, which was introduced in 2004 as a separate submission type to the 10-page double-column "full paper" format. From 2004-2012, 4-page notes of 2,500 to 4,500…
▽ More
In this note, I quantitatively examine various trends in the lengths of published papers in ACM CSCW from 2000-2018, focusing on several major transitions in editorial and reviewing policy. The focus is on the rise and fall of the 4-page note, which was introduced in 2004 as a separate submission type to the 10-page double-column "full paper" format. From 2004-2012, 4-page notes of 2,500 to 4,500 words consistently represented about 20-35\% of all publications. In 2013, minimum and maximum page lengths were officially removed, with no formal distinction made between full papers and notes. The note soon completely disappeared as a distinct genre, which co-occurred with a trend in steadily rising paper lengths. I discuss such findings both as they directly relate to local concerns in CSCW and in the context of longstanding theoretical discussions around genre theory and how socio-technical structures and affordances impact participation in distributed, computer-mediated organizations and user-generated content platforms. There are many possible explanations for the decline of the note and the emergence of longer and longer papers, which I identify for future work. I conclude by addressing the implications of such findings for the CSCW community, particularly given how genre norms impact what kinds of scholarship and scholars thrive in CSCW, as well as whether new top-down rules or bottom-up guidelines ought to be developed around paper lengths and different kinds of contributions.
△ Less
Submitted 9 September, 2019; v1 submitted 28 August, 2019;
originally announced August 2019.
-
Sample-efficient Adversarial Imitation Learning from Observation
Authors:
Faraz Torabi,
Sean Geiger,
Garrett Warnell,
Peter Stone
Abstract:
Imitation from observation is the framework of learning tasks by observing demonstrated state-only trajectories. Recently, adversarial approaches have achieved significant performance improvements over other methods for imitating complex behaviors. However, these adversarial imitation algorithms often require many demonstration examples and learning iterations to produce a policy that is successfu…
▽ More
Imitation from observation is the framework of learning tasks by observing demonstrated state-only trajectories. Recently, adversarial approaches have achieved significant performance improvements over other methods for imitating complex behaviors. However, these adversarial imitation algorithms often require many demonstration examples and learning iterations to produce a policy that is successful at imitating a demonstrator's behavior. This high sample complexity often prohibits these algorithms from being deployed on physical robots. In this paper, we propose an algorithm that addresses the sample inefficiency problem by utilizing ideas from trajectory centric reinforcement learning algorithms. We test our algorithm and conduct experiments using an imitation task on a physical robot arm and its simulated version in Gazebo and will show the improvement in learning rate and efficiency.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Black-boxing the user: internet protocol over xylophone players (IPoXP)
Authors:
R. Stuart Geiger,
Yoon Jung Jeong,
Emily Manders
Abstract:
We introduce IP over Xylophone Players (IPoXP), a novel Internet protocol between two computers using xylophone-based Arduino interfaces. In our implementation, human operators are situated within the lowest layer of the network, transmitting data between computers by striking designated keys. We discuss how IPoXP inverts the traditional mode of human-computer interaction, with a computer using th…
▽ More
We introduce IP over Xylophone Players (IPoXP), a novel Internet protocol between two computers using xylophone-based Arduino interfaces. In our implementation, human operators are situated within the lowest layer of the network, transmitting data between computers by striking designated keys. We discuss how IPoXP inverts the traditional mode of human-computer interaction, with a computer using the human as an interface to communicate with another computer.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
The Lives of Bots
Authors:
R. Stuart Geiger
Abstract:
Automated software agents --- or bots --- have long been an important part of how Wikipedia's volunteer community of editors write, edit, update, monitor, and moderate content. In this paper, I discuss the complex social and technical environment in which Wikipedia's bots operate. This paper focuses on the establishment and role of English Wikipedia's bot policies and the Bot Approvals Group, a vo…
▽ More
Automated software agents --- or bots --- have long been an important part of how Wikipedia's volunteer community of editors write, edit, update, monitor, and moderate content. In this paper, I discuss the complex social and technical environment in which Wikipedia's bots operate. This paper focuses on the establishment and role of English Wikipedia's bot policies and the Bot Approvals Group, a volunteer committee that reviews applications for new bots and helps resolve conflicts between Wikipedians about automation. In particular, I examine an early bot controversy over the first bot in Wikipedia to automatically enforce a social norm about how Wikipedian editors ought to interact in discussion spaces. As I show, bots enforce many rules in Wikipedia, but humans produce these bots and negotiate rules around their operation. Because of the openness of Wikipedia's processes around automation, we can vividly observe the often-invisible human work involved in such algorithmic systems --- in stark contrast to most other user-generated content platforms.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of 'Even Good Bots Fight'
Authors:
R. Stuart Geiger,
Aaron Halfaker
Abstract:
This paper replicates, extends, and refutes conclusions made in a study published in PLoS ONE ("Even Good Bots Fight"), which claimed to identify substantial levels of conflict between automated software agents (or bots) in Wikipedia using purely quantitative methods. By applying an integrative mixed-methods approach drawing on trace ethnography, we place these alleged cases of bot-bot conflict in…
▽ More
This paper replicates, extends, and refutes conclusions made in a study published in PLoS ONE ("Even Good Bots Fight"), which claimed to identify substantial levels of conflict between automated software agents (or bots) in Wikipedia using purely quantitative methods. By applying an integrative mixed-methods approach drawing on trace ethnography, we place these alleged cases of bot-bot conflict into context and arrive at a better understanding of these interactions. We found that overwhelmingly, the interactions previously characterized as problematic instances of conflict are typically better characterized as routine, productive, even collaborative work. These results challenge past work and show the importance of qualitative/quantitative collaboration. In our paper, we present quantitative metrics and qualitative heuristics for operationalizing bot-bot conflict. We give thick descriptions of kinds of events that present as bot-bot reverts, hel** distinguish conflict from non-conflict. We computationally classify these kinds of events through patterns in edit summaries. By interpreting found/trace data in the socio-technical contexts in which people give that data meaning, we gain more from quantitative measurements, drawing deeper understandings about the governance of algorithmic systems in Wikipedia. We have also released our data collection, processing, and analysis pipeline, to facilitate computational reproducibility of our findings and to help other researchers interested in conducting similar mixed-method scholarship in other platforms and contexts.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work
Authors:
R. Stuart Geiger,
Nelle Varoquaux,
Charlotte Mazel-Cabasse,
Chris Holdgraf
Abstract:
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in hel** programmers/analysts know what libraries are available and how to use them. Yet documentation for open source softwar…
▽ More
Computational research and data analytics increasingly relies on complex ecosystems of open source software (OSS) "libraries" -- curated collections of reusable code that programmers import to perform a specific task. Software documentation for these libraries is crucial in hel** programmers/analysts know what libraries are available and how to use them. Yet documentation for open source software libraries is widely considered low-quality. This article is a collaboration between CSCW researchers and contributors to data analytics OSS libraries, based on ethnographic fieldwork and qualitative interviews. We examine several issues around the formats, practices, and challenges around documentation in these largely volunteer-based projects. There are many different kinds and formats of documentation that exist around such libraries, which play a variety of educational, promotional, and organizational roles. The work behind documentation is similarly multifaceted, including writing, reviewing, maintaining, and organizing documentation. Different aspects of documentation work require contributors to have different sets of skills and overcome various social and technical barriers. Finally, most of our interviewees do not report high levels of intrinsic enjoyment for doing documentation work (compared to writing code). Their motivation is affected by personal and project-specific factors, such as the perceived level of credit for doing documentation work versus more "technical" tasks like adding new features or fixing bugs. In studying documentation work for data analytics OSS libraries, we gain a new window into the changing practices of data-intensive research, as well as help practitioners better understand how to support this often invisible and infrastructural work in their projects.
△ Less
Submitted 31 May, 2018;
originally announced May 2018.
-
Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture
Authors:
R. Stuart Geiger
Abstract:
Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned experti…
▽ More
Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned expertise involved in participating in a highly automated social-technical environment. Today, the organizational culture of Wikipedia is deeply intertwined with various data-driven algorithmic systems, which Wikipedians rely on to help manage and govern the "anyone can edit" encyclopedia at a massive scale. These bots, scripts, tools, plugins, and dashboards make Wikipedia more efficient for those who know how to work with them, but like all organizational culture, newcomers must learn them if they want to fully participate. I illustrate how cultural and organizational expertise is enacted around algorithmic agents by discussing two autoethnographic vignettes, which relate my personal experience as a veteran in Wikipedia. I present thick descriptions of how governance and gatekee** practices are articulated through and in alignment with these automated infrastructures. Over the past 15 years, Wikipedian veterans and administrators have made specific decisions to support administrative and editorial workflows with automation in particular ways and not others. I use these cases of Wikipedia's bot-supported bureaucracy to discuss several issues in the fields of critical algorithms studies, critical data studies, and fairness, accountability, and transparency in machine learning -- most principally arguing that scholarship and practice must go beyond trying to "open up the black box" of such systems and also examine sociocultural processes like newcomer socialization.
△ Less
Submitted 1 October, 2017; v1 submitted 26 September, 2017;
originally announced September 2017.
-
Summary Analysis of the 2017 GitHub Open Source Survey
Authors:
R. Stuart Geiger
Abstract:
This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey.
This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Report on the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)
Authors:
Daniel S. Katz,
Kyle E. Niemeyer,
Sandra Gesing,
Lorraine Hwang,
Wolfgang Bangerth,
Simon Hettrick,
Ray Idaszak,
Jean Salac,
Neil Chue Hong,
Santiago Núñez Corrales,
Alice Allen,
R. Stuart Geiger,
Jonah Miller,
Emily Chen,
Anshu Dubey,
Patricia Lago
Abstract:
This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a pa…
▽ More
This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a panel discussion. The main part of the report covers the set of working groups that formed during the meeting, and for each, discusses the participants, the objective and goal, and how the objective can be reached, along with contact information for readers who may want to join the group. Finally, we present results from a survey of the workshop attendees.
△ Less
Submitted 18 May, 2017; v1 submitted 7 May, 2017;
originally announced May 2017.
-
Asynchronous Discrete Event Schemes for PDEs
Authors:
Daniel Stone,
Sebastian Geiger,
Gabriel Lord
Abstract:
A new class of asynchronous discrete-event simulation schemes for advection-diffusion-reaction equations are introduced, which is based on the principle of allowing quanta of mass to pass through faces of a Cartesian finite volume grid. The timescales of these events are linked to the flux on the the face, and the schemes are self-adaptive, local in time and space. Experiments are performed on rea…
▽ More
A new class of asynchronous discrete-event simulation schemes for advection-diffusion-reaction equations are introduced, which is based on the principle of allowing quanta of mass to pass through faces of a Cartesian finite volume grid. The timescales of these events are linked to the flux on the the face, and the schemes are self-adaptive, local in time and space. Experiments are performed on realistic physical systems related to porous media flow applications, including a large 3D advection diffusion equation and advection diffusion reaction systems. The results are compared to highly accurate results where the temporal evolution is computed with exponential integrator schemes using the same finite volume discretisation. This allows a reliable estimation of the solution error. Our results indicate a first order convergence of the error as a control parameter is decreased.
△ Less
Submitted 17 October, 2016;
originally announced October 2016.