-
Insights from the Field: Exploring Students' Perspectives on Bad Unit Testing Practices
Authors:
Anthony Peruma,
Eman Abdullah AlOmar,
Wajdi Aljedaani,
Christian D. Newman,
Mohamed Wiem Mkaouer
Abstract:
Educating students about software testing practices is integral to the curricula of many computer science-related courses and typically involves students writing unit tests. Similar to production/source code, students might inadvertently deviate from established unit testing best practices, and introduce problematic code, referred to as test smells, into their test suites. Given the extensive cata…
▽ More
Educating students about software testing practices is integral to the curricula of many computer science-related courses and typically involves students writing unit tests. Similar to production/source code, students might inadvertently deviate from established unit testing best practices, and introduce problematic code, referred to as test smells, into their test suites. Given the extensive catalog of test smells, it becomes challenging for students to identify test smells in their code, especially for those who lack experience with testing practices. In this experience report, we aim to increase students' awareness of bad unit testing practices, and detail the outcomes of having 184 students from three higher educational institutes utilize an IDE plugin to automatically detect test smells in their code. Our findings show that while students report on the plugin's usefulness in learning about and detecting test smells, they also identify specific test smells that they consider harmless. We anticipate that our findings will support academia in refining course curricula on unit testing and enabling educators to support students with code review strategies of test code.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Exploring Faculty Identity Sharing: A Pathway to Empathy in Physics Faculty
Authors:
Alia Hamdan,
Ash Bista,
Dina Newman,
Scott Franklin
Abstract:
This study investigates how faculty acquire contextual information about students, examining mechanisms and motivations used when sharing their identity to facilitate empathy. Empathy is "the ability and tendency to share and understand others' internal state" (Zaki and Ochsner, 2012), and is a critical factor in both motivating faculty to enact large scale change and take immediate, smaller actio…
▽ More
This study investigates how faculty acquire contextual information about students, examining mechanisms and motivations used when sharing their identity to facilitate empathy. Empathy is "the ability and tendency to share and understand others' internal state" (Zaki and Ochsner, 2012), and is a critical factor in both motivating faculty to enact large scale change and take immediate, smaller actions. This study explores the impact identity sharing has on obtaining contextual information that motivates empathetic action. Nineteen semi structured interviews with physics faculty explored participant identities and interactions across various contexts. Employing emergent thematic coding, we crafted four personas around faculty sharing, teaching values, and student reciprocity. Brooke, the Trust Builder, prioritizes creating an environment of trust by openly discussing their identity, aiming to foster student openness. Nour, the Identity Navigator, shares personal experiences to assist others in navigating their own identities, acknowledging the challenges of college years. Brooke and Nour had more students approaching them with personal issues, indicating a correlation between faculty identity sharing and student openness. Casey, the Cautious Sharer, expresses concerns about potential alienation or backlash, approaching personal sharing with caution. Wray, adopting a Walled Off approach, separates personal and professional life due to past negative experiences or a belief in the importance of that division. Among the faculty interviewed, 15 who were explicitly open about their identities reported that students approached them with personal issues. This study outlines mechanisms influencing when and what faculty share about themselves in different contexts. Our findings underscore the significance of fostering dialogue as the initial step in empathy development.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Quantum number conservation in meson interactions
Authors:
Douglas Newman
Abstract:
Quantum number descriptions of spin-zero mesons are obtained from their quark/anti-quark structures, extending the application of the already established seven quantum number conservation in fermion interactions to include mesons. This provides a new tool in the design of experiments and the analysis of results, making it particularly useful for those interactions that involve neutral mesons. One…
▽ More
Quantum number descriptions of spin-zero mesons are obtained from their quark/anti-quark structures, extending the application of the already established seven quantum number conservation in fermion interactions to include mesons. This provides a new tool in the design of experiments and the analysis of results, making it particularly useful for those interactions that involve neutral mesons. One quantum number is parity, conflicting with the evidence for parity non-conservation. The source of this disagreement is identified as the poor definition of parity in the Standard Model. It is also shown that mesons can produce inter-generational interactions, providing a viable alternative to the see-saw mechanism of neutrino interactions.
△ Less
Submitted 6 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Slow electron holes in the Earth's magnetosheath
Authors:
Z. I. Shaikh,
I. Y. Vasko,
I. H. Hutchinson,
S. R. Kamaletdinov,
J. C. Holmes,
D. L. Newman,
F. S. Mozer
Abstract:
We present a statistical analysis of electrostatic solitary waves observed aboard Magnetospheric Multiscale spacecraft in the Earth's magnetosheath. Applying single-spacecraft interferometry to several hundred solitary waves collected in about two minute intervals, we show that almost all of them have the electrostatic potential of positive polarity and propagate quasi-parallel to the local magnet…
▽ More
We present a statistical analysis of electrostatic solitary waves observed aboard Magnetospheric Multiscale spacecraft in the Earth's magnetosheath. Applying single-spacecraft interferometry to several hundred solitary waves collected in about two minute intervals, we show that almost all of them have the electrostatic potential of positive polarity and propagate quasi-parallel to the local magnetic field with plasma frame velocities of the order of 100 km/s. The solitary waves have typical parallel half-widths from 10 to 100 m that is between 1 and 10 Debye lengths and typical amplitudes of the electrostatic potential from 10 to 200 mV that is between 0.01 and 1\% of local electron temperature. The solitary waves are associated with quasi-Maxwellian ion velocity distribution functions, and their plasma frame velocities are comparable with ion thermal speed and well below electron thermal speed. We argue that the solitary waves of positive polarity are slow electron holes and estimate the time scale of their acceleration, which occurs due to interaction with ions, to be of the order of one second. The observation of slow electron holes indicates that their lifetime was shorter than the acceleration time scale. We argue that multi-spacecraft interferometry applied previously to these solitary waves is not applicable because of their too-short spatial scales. The source of the slow electron holes and the role in electron-ion energy exchange remain to be established.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
How to Refactor this Code? An Exploratory Study on Developer-ChatGPT Refactoring Conversations
Authors:
Eman Abdullah AlOmar,
Anushkrishna Venkatakrishnan,
Mohamed Wiem Mkaouer,
Christian D. Newman,
Ali Ouni
Abstract:
Large Language Models (LLMs), like ChatGPT, have gained widespread popularity and usage in various software engineering tasks, including refactoring, testing, code review, and program comprehension. Despite recent studies delving into refactoring documentation in commit messages, issues, and code review, little is known about how developers articulate their refactoring needs when interacting with…
▽ More
Large Language Models (LLMs), like ChatGPT, have gained widespread popularity and usage in various software engineering tasks, including refactoring, testing, code review, and program comprehension. Despite recent studies delving into refactoring documentation in commit messages, issues, and code review, little is known about how developers articulate their refactoring needs when interacting with ChatGPT. In this paper, our goal is to explore conversations between developers and ChatGPT related to refactoring to better understand how developers identify areas for improvement in code and how ChatGPT addresses developers' needs. Our approach relies on text mining refactoring-related conversations from 17,913 ChatGPT prompts and responses, and investigating developers' explicit refactoring intention. Our results reveal that (1) developer-ChatGPT conversations commonly involve generic and specific terms/phrases; (2) developers often make generic refactoring requests, while ChatGPT typically includes the refactoring intention; and (3) various learning settings when prompting ChatGPT in the context of refactoring. We envision that our findings contribute to a broader understanding of the collaboration between developers and AI models, in the context of code refactoring, with implications for model improvement, tool development, and best practices in software engineering.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Anomalous Behavior Detection in Trajectory Data of Older Drivers
Authors:
Seyedeh Gol Ara Ghoreishi,
Sonia Moshfeghi,
Muhammad Tanveer Jan,
Joshua Conniff,
KwangSoo Yang,
**woo Jang,
Borko Furht,
Ruth Tappen,
David Newman,
Monica Rosselli,
Jiannan Zhai
Abstract:
Given a road network and a set of trajectory data, the anomalous behavior detection (ABD) problem is to identify drivers that show significant directional deviations, hardbrakings, and accelerations in their trips. The ABD problem is important in many societal applications, including Mild Cognitive Impairment (MCI) detection and safe route recommendations for older drivers. The ABD problem is comp…
▽ More
Given a road network and a set of trajectory data, the anomalous behavior detection (ABD) problem is to identify drivers that show significant directional deviations, hardbrakings, and accelerations in their trips. The ABD problem is important in many societal applications, including Mild Cognitive Impairment (MCI) detection and safe route recommendations for older drivers. The ABD problem is computationally challenging due to the large size of temporally-detailed trajectories dataset. In this paper, we propose an Edge-Attributed Matrix that can represent the key properties of temporally-detailed trajectory datasets and identify abnormal driving behaviors. Experiments using real-world datasets demonstrated that our approach identifies abnormal driving behaviors.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
In-vehicle Sensing and Data Analysis for Older Drivers with Mild Cognitive Impairment
Authors:
Sonia Moshfeghi,
Muhammad Tanveer Jan,
Joshua Conniff,
Seyedeh Gol Ara Ghoreishi,
**woo Jang,
Borko Furht,
Kwangsoo Yang,
Monica Rosselli,
David Newman,
Ruth Tappen,
Dana Smith
Abstract:
Driving is a complex daily activity indicating age and disease related cognitive declines. Therefore, deficits in driving performance compared with ones without mild cognitive impairment (MCI) can reflect changes in cognitive functioning. There is increasing evidence that unobtrusive monitoring of older adults driving performance in a daily-life setting may allow us to detect subtle early changes…
▽ More
Driving is a complex daily activity indicating age and disease related cognitive declines. Therefore, deficits in driving performance compared with ones without mild cognitive impairment (MCI) can reflect changes in cognitive functioning. There is increasing evidence that unobtrusive monitoring of older adults driving performance in a daily-life setting may allow us to detect subtle early changes in cognition. The objectives of this paper include designing low-cost in-vehicle sensing hardware capable of obtaining high-precision positioning and telematics data, identifying important indicators for early changes in cognition, and detecting early-warning signs of cognitive impairment in a truly normal, day-to-day driving condition with machine learning approaches. Our statistical analysis comparing drivers with MCI to those without reveals that those with MCI exhibit smoother and safer driving patterns. This suggests that drivers with MCI are cognizant of their condition and tend to avoid erratic driving behaviors. Furthermore, our Random Forest models identified the number of night trips, number of trips, and education as the most influential factors in our data evaluation.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Problems with the Standard Model of particle physics
Authors:
Douglas Newman
Abstract:
The Standard Model (SM) of particle physics is in such good agreement with experiment that it is currently accepted as providing an accurate model of reality, with the role of chiral symmetry breaking in electro-weak unification regarded as one of its major achievements. This work focusses on weaknesses in the algebraic formulation of the SM, especially the introduction of chirality, which is show…
▽ More
The Standard Model (SM) of particle physics is in such good agreement with experiment that it is currently accepted as providing an accurate model of reality, with the role of chiral symmetry breaking in electro-weak unification regarded as one of its major achievements. This work focusses on weaknesses in the algebraic formulation of the SM, especially the introduction of chirality, which is shown to be inconsistent with neutrinos being fermions, and conflicts with the experimental evidence that they have finite mass. The SM description of parity is also shown to be erroneous. It is argued that these errors in its algebraic formulation have had the effect of making theoretical extensions of the SM impossible, as well as wasting considerable experimental effort in trying to understand erroneous predictions of the theory.
△ Less
Submitted 3 September, 2023; v1 submitted 6 August, 2023;
originally announced August 2023.
-
GEO-Bench: Toward Foundation Models for Earth Monitoring
Authors:
Alexandre Lacoste,
Nils Lehmann,
Pau Rodriguez,
Evan David Sherwin,
Hannah Kerner,
Björn Lütjens,
Jeremy Andrew Irvin,
David Dao,
Hamed Alemohammad,
Alexandre Drouin,
Mehmet Gunturkun,
Gabriel Huang,
David Vazquez,
Dava Newman,
Yoshua Bengio,
Stefano Ermon,
Xiao Xiang Zhu
Abstract:
Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote s…
▽ More
Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote sensing tasks is limited. To stimulate the development of foundation models for Earth monitoring, we propose a benchmark comprised of six classification and six segmentation tasks, which were carefully curated and adapted to be both relevant to the field and well-suited for model evaluation. We accompany this benchmark with a robust methodology for evaluating models and reporting aggregated results to enable a reliable assessment of progress. Finally, we report results for 20 baselines to gain information about the performance of existing models. We believe that this benchmark will be a driver of progress across a variety of Earth monitoring tasks.
△ Less
Submitted 23 December, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
An Exploratory Study on the Usage and Readability of Messages Within Assertion Methods of Test Cases
Authors:
Taryn Takebayashi,
Anthony Peruma,
Mohamed Wiem Mkaouer,
Christian D. Newman
Abstract:
Unit testing is a vital part of the software development process and involves developers writing code to verify or assert production code. Furthermore, to help comprehend the test case and troubleshoot issues, developers have the option to provide a message that explains the reason for the assertion failure. In this exploratory empirical study, we examine the characteristics of assertion messages…
▽ More
Unit testing is a vital part of the software development process and involves developers writing code to verify or assert production code. Furthermore, to help comprehend the test case and troubleshoot issues, developers have the option to provide a message that explains the reason for the assertion failure. In this exploratory empirical study, we examine the characteristics of assertion messages contained in the test methods in 20 open-source Java systems. Our findings show that while developers rarely utilize the option of supplying a message, those who do, either compose it of only string literals, identifiers, or a combination of both types. Using standard English readability measuring techniques, we observe that a beginner's knowledge of English is required to understand messages containing only identifiers, while a 4th-grade education level is required to understand messages composed of string literals. We also discuss shortcomings with using such readability measuring techniques and common anti-patterns in assert message construction. We envision our results incorporated into code quality tools that appraise the understandability of assertion messages.
△ Less
Submitted 28 February, 2023;
originally announced March 2023.
-
Rename Chains: An Exploratory Study on the Occurrence and Characteristics of Identifiers Undergoing Multiple Renamings
Authors:
Anthony Peruma,
Christian D. Newman
Abstract:
Identifier names play a significant role in program comprehension activities, with high-quality names improving developer productivity and system quality. To correct poor-quality names, developers rename identifiers to reflect their intended purpose better. However, renames do not always result in high-quality, long-lasting names; in many cases, developers perform multiple rename operations on the…
▽ More
Identifier names play a significant role in program comprehension activities, with high-quality names improving developer productivity and system quality. To correct poor-quality names, developers rename identifiers to reflect their intended purpose better. However, renames do not always result in high-quality, long-lasting names; in many cases, developers perform multiple rename operations on the same identifier throughout the system's lifetime. In this paper, we report on a large-scale empirical study that examines the occurrence of identifiers undergoing multiple renames (i.e., rename chains). Our findings show the presence of rename chains in almost every project, with methods typically having more rename chains than other identifier types. Furthermore, it is usually the same developer responsible for creating all renames within a chain, with most names maintaining the same grammatical structure. Understanding rename chains can help us provide stronger advice, and targeted research, on how to craft high-quality, long-lasting identifiers.
△ Less
Submitted 23 February, 2023; v1 submitted 22 February, 2023;
originally announced February 2023.
-
Methods and Tools for Monitoring Driver's Behavior
Authors:
Muhammad Tanveer Jan,
Sonia Moshfeghi,
Joshua William Conniff,
**woo Jang,
Kwangsoo Yang,
Jiannan Zhai,
Monica Rosselli,
David Newman,
Ruth Tappen,
Borko Furht
Abstract:
In-vehicle sensing technology has gained tremendous attention due to its ability to support major technological developments, such as connected vehicles and self-driving cars. In-vehicle sensing data are invaluable and important data sources for traffic management systems. In this paper we propose an innovative architecture of unobtrusive in-vehicle sensors and present methods and tools that are u…
▽ More
In-vehicle sensing technology has gained tremendous attention due to its ability to support major technological developments, such as connected vehicles and self-driving cars. In-vehicle sensing data are invaluable and important data sources for traffic management systems. In this paper we propose an innovative architecture of unobtrusive in-vehicle sensors and present methods and tools that are used to measure the behavior of drivers. The proposed architecture including methods and tools are used in our NIH project to monitor and identify older drivers with early dementia
△ Less
Submitted 27 March, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Discrete symmetries and quantum number conservation
Authors:
Douglas Newman
Abstract:
The algebraic formulation of discrete $P$ and $T$ space-time symmetries in terms of the Dirac algebra is shown to be flawed. This is corrected by relating these symmetries to fermion quantum numbers defined by a $Cl_{3,3}$ sub-algebra of the $Cl_{7,7}$ Clifford Unification algebra. A new {\it Conservation Law} is formulated to the effect that fermion decays and interactions conserve all seven quan…
▽ More
The algebraic formulation of discrete $P$ and $T$ space-time symmetries in terms of the Dirac algebra is shown to be flawed. This is corrected by relating these symmetries to fermion quantum numbers defined by a $Cl_{3,3}$ sub-algebra of the $Cl_{7,7}$ Clifford Unification algebra. A new {\it Conservation Law} is formulated to the effect that fermion decays and interactions conserve all seven quantum numbers defined by $Cl_{7,7}$. This can be applied in the design of high energy experiments and the qualitative interpretation of their results.
△ Less
Submitted 6 June, 2024; v1 submitted 18 November, 2022;
originally announced November 2022.
-
Multiscale Neural Operator: Learning Fast and Grid-independent PDE Solvers
Authors:
Björn Lütjens,
Catherine H. Crawford,
Campbell D Watson,
Christopher Hill,
Dava Newman
Abstract:
Numerical simulations in climate, chemistry, or astrophysics are computationally too expensive for uncertainty quantification or parameter-exploration at high-resolution. Reduced-order or surrogate models are multiple orders of magnitude faster, but traditional surrogates are inflexible or inaccurate and pure machine learning (ML)-based surrogates too data-hungry. We propose a hybrid, flexible sur…
▽ More
Numerical simulations in climate, chemistry, or astrophysics are computationally too expensive for uncertainty quantification or parameter-exploration at high-resolution. Reduced-order or surrogate models are multiple orders of magnitude faster, but traditional surrogates are inflexible or inaccurate and pure machine learning (ML)-based surrogates too data-hungry. We propose a hybrid, flexible surrogate model that exploits known physics for simulating large-scale dynamics and limits learning to the hard-to-model term, which is called parametrization or closure and captures the effect of fine- onto large-scale dynamics. Leveraging neural operators, we are the first to learn grid-independent, non-local, and flexible parametrizations. Our \textit{multiscale neural operator} is motivated by a rich literature in multiscale modeling, has quasilinear runtime complexity, is more accurate or flexible than state-of-the-art parametrizations and demonstrated on the chaotic equation multiscale Lorenz96.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
When happy accidents spark creativity: Bringing collaborative speculation to life with generative AI
Authors:
Ziv Epstein,
Hope Schroeder,
Dava Newman
Abstract:
Generative AI techniques like those that synthesize images from text (text-to-image models) offer new possibilities for creatively imagining new ideas. We investigate the capabilities of these models to help communities engage in conversations about their collective future. In particular, we design and deploy a facilitated experience where participants collaboratively speculate on utopias they wan…
▽ More
Generative AI techniques like those that synthesize images from text (text-to-image models) offer new possibilities for creatively imagining new ideas. We investigate the capabilities of these models to help communities engage in conversations about their collective future. In particular, we design and deploy a facilitated experience where participants collaboratively speculate on utopias they want to see, and then produce AI-generated imagery from those speculations. In a series of in-depth user interviews, we invite participants to reflect on the generated images and refine their visions for the future. We synthesize findings with a bespoke community zine on the experience. We observe that participants often generated ideas for implementing their vision and drew new lateral considerations as a result of viewing the generated images. Critically, we find that the unexpected difference between the participant's imagined output and the generated image is what facilitated new insight for the participant. We hope our experimental model for co-creation, computational creativity, and community reflection inspires the use of generative models to help communities and organizations envision better futures.
△ Less
Submitted 17 June, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Simulations for Planning Next-Generation Exoplanet Radial Velocity Surveys
Authors:
Patrick D. Newman,
Peter Plavchan,
Jennifer A. Burt,
Johanna Teske,
Eric E. Mamajek,
2 Stephanie Leifer,
B. Scott Gaudi,
Gary Blackwood,
Rhonda Morgan
Abstract:
Future direct imaging missions such as HabEx and LUVOIR aim to catalog and characterize Earth-mass analogs around nearby stars. The exoplanet yield of these missions will be dependent on the frequency of Earth-like planets, and potentially the a priori knowledge of which stars specifically host suitable planetary systems. Ground or space based radial velocity surveys can potentially perform the pr…
▽ More
Future direct imaging missions such as HabEx and LUVOIR aim to catalog and characterize Earth-mass analogs around nearby stars. The exoplanet yield of these missions will be dependent on the frequency of Earth-like planets, and potentially the a priori knowledge of which stars specifically host suitable planetary systems. Ground or space based radial velocity surveys can potentially perform the pre-selection of targets and assist in the optimization of observation times, as opposed to an uninformed direct imaging survey. In this paper, we present our framework for simulating future radial velocity surveys of nearby stars in support of direct imaging missions. We generate lists of exposure times, observation time-series, and radial velocity time-series given a direct imaging target list. We generate simulated surveys for a proposed set of telescopes and precise radial velocity spectrographs spanning a set of plausible global-network architectures that may be considered for next generation extremely precise radial velocity surveys. We also develop figures of merit for observation frequency and planet detection sensitivity, and compare these across architectures. From these, we draw conclusions, given our stated assumptions and caveats, to optimize the yield of future radial velocity surveys in support of direct imaging missions. We find that all of our considered surveys obtain sufficient numbers of precise observations to meet the minimum theoretical white noise detection sensitivity for Earth-mass habitable zone planets, with margin to explore systematic effects due to stellar activity and correlated noise.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
An Exploratory Study on Refactoring Documentation in Issues Handling
Authors:
Eman Abdullah AlOmar,
Anthony Peruma,
Mohamed Wiem Mkaouer,
Christian D. Newman,
Ali Ouni
Abstract:
Understanding the practice of refactoring documentation is of paramount importance in academia and industry. Issue tracking systems are used by most software projects enabling developers, quality assurance, managers, and users to submit feature requests and other tasks such as bug fixing and code review. Although recent studies explored how to document refactoring in commit messages, little is kno…
▽ More
Understanding the practice of refactoring documentation is of paramount importance in academia and industry. Issue tracking systems are used by most software projects enabling developers, quality assurance, managers, and users to submit feature requests and other tasks such as bug fixing and code review. Although recent studies explored how to document refactoring in commit messages, little is known about how developers describe their refactoring needs in issues. In this study, we aim at exploring developer-reported refactoring changes in issues to better understand what developers consider to be problematic in their code and how they handle it. Our approach relies on text mining 45,477 refactoring-related issues and identifying refactoring patterns from a diverse corpus of 77 Java projects by investigating issues associated with 15,833 refactoring operations and developers' explicit refactoring intention. Our results show that (1) developers mostly use move refactoring related terms/phrases to target refactoring-related issues; and (2) developers tend to explicitly mention the improvement of specific quality attributes and focus on duplicate code removal. We envision our findings enabling tool builders to support developers with automated documentation of refactoring changes in issues.
△ Less
Submitted 18 March, 2022;
originally announced March 2022.
-
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and Refactoring
Authors:
Anthony Peruma,
Eman Abdullah AlOmar,
Christian D. Newman,
Mohamed Wiem Mkaouer,
Ali Ouni
Abstract:
To meet project timelines or budget constraints, developers intentionally deviate from writing optimal code to feasible code in what is known as incurring Technical Debt (TD). Furthermore, as part of planning their correction, developers document these deficiencies as comments in the code (i.e., self-admitted technical debt or SATD). As a means of improving source code quality, developers often ap…
▽ More
To meet project timelines or budget constraints, developers intentionally deviate from writing optimal code to feasible code in what is known as incurring Technical Debt (TD). Furthermore, as part of planning their correction, developers document these deficiencies as comments in the code (i.e., self-admitted technical debt or SATD). As a means of improving source code quality, developers often apply a series of refactoring operations to their codebase. In this study, we explore developers repaying this debt through refactoring operations by examining occurrences of SATD removal in the code of 76 open-source Java systems. Our findings show that TD payment usually occurs with refactoring activities and developers refactor their code to remove TD for specific reasons. We envision our findings supporting vendors in providing tools to better support developers in the automatic repayment of technical debt.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Understanding Digits in Identifier Names: An Exploratory Study
Authors:
Anthony Peruma,
Christian D. Newman
Abstract:
Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers free to construct identifier names to their liking, it can be difficult to automatically reason about the quality and semantics behind an identifier…
▽ More
Before any software maintenance can occur, developers must read the identifier names found in the code to be maintained. Thus, high-quality identifier names are essential for productive program comprehension and maintenance activities. With developers free to construct identifier names to their liking, it can be difficult to automatically reason about the quality and semantics behind an identifier name. Studying the structure of identifier names can help alleviate this problem. Existing research focuses on studying words within identifiers, but there are other symbols that appear in identifier names -- such as digits. This paper explores the presence and purpose of digits in identifier names through an empirical study of 800 open-source Java systems. We study how digits contribute to the semantics of identifier names and how identifier names that contain digits evolve over time through renaming. We envision our findings improving the efficiency of name appraisal and recommendation tools and techniques.
△ Less
Submitted 15 March, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
-
Fourth Generation fermions providing new candidates for Dark Matter and Dark Energy
Authors:
Douglas Newman
Abstract:
Clifford Unification describes all known elementary fermions and their interactions, providing a useful tool in the analysis of experimental data. It also predicts the existence of a stable fourth generation (G4) of fermions, with electric charges different to their first, second and third generation counterparts. This makes neutral G4 baryon and neutral G4 lepton composites possible, respectively…
▽ More
Clifford Unification describes all known elementary fermions and their interactions, providing a useful tool in the analysis of experimental data. It also predicts the existence of a stable fourth generation (G4) of fermions, with electric charges different to their first, second and third generation counterparts. This makes neutral G4 baryon and neutral G4 lepton composites possible, respectively providing candidates for Dark Matter and Dark Energy, which are examined in the light of experimental evidence.
△ Less
Submitted 13 June, 2023; v1 submitted 9 January, 2022;
originally announced January 2022.
-
How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow
Authors:
Anthony Peruma,
Steven Simmons,
Eman Abdullah AlOmar,
Christian D. Newman,
Mohamed Wiem Mkaouer,
Ali Ouni
Abstract:
An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with…
▽ More
An essential part of software maintenance and evolution, refactoring is performed by developers, regardless of technology or domain, to improve the internal quality of the system, and reduce its technical debt. However, choosing the appropriate refactoring strategy is not always straightforward, resulting in developers seeking assistance. Although research in refactoring is well-established, with several studies altering between the detection of refactoring opportunities and the recommendation of appropriate code changes, little is known about their adoption in practice. Analyzing the perception of developers is critical to understand better what developers consider to be problematic in their code and how they handle it. Additionally, there is a need for bridging the gap between refactoring, as research, and its adoption in practice, by extracting common refactoring intents that are more suitable for what developers face in reality. In this study, we analyze refactoring discussions on Stack Overflow through a series of quantitative and qualitative experiments. Our results show that Stack Overflow is utilized by a diverse set of developers for refactoring assistance for a variety of technologies. Our observations show five areas that developers typically require help with refactoring -- Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database. We envision our findings better bridge the support between traditional (or academic) aspects of refactoring and their real-world applicability, including better tool support.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
Digital Twin Earth -- Coasts: Develo** a fast and physics-informed surrogate model for coastal floods via neural operators
Authors:
Peishi Jiang,
Nis Meinert,
Helga Jordão,
Constantin Weisser,
Simon Holgate,
Alexander Lavin,
Björn Lütjens,
Dava Newman,
Haruko Wainwright,
Catherine Walker,
Patrick Barnard
Abstract:
Develo** fast and accurate surrogates for physics-based coastal and ocean models is an urgent need due to the coastal flood risk under accelerating sea level rise, and the computational expense of deterministic numerical models. For this purpose, we develop the first digital twin of Earth coastlines with new physics-informed machine learning techniques extending the state-of-art Neural Operator.…
▽ More
Develo** fast and accurate surrogates for physics-based coastal and ocean models is an urgent need due to the coastal flood risk under accelerating sea level rise, and the computational expense of deterministic numerical models. For this purpose, we develop the first digital twin of Earth coastlines with new physics-informed machine learning techniques extending the state-of-art Neural Operator. As a proof-of-concept study, we built Fourier Neural Operator (FNO) surrogates on the simulations of an industry-standard flood and ocean model (NEMO). The resulting FNO surrogate accurately predicts the sea surface height in most regions while achieving upwards of 45x acceleration of NEMO. We delivered an open-source \textit{CoastalTwin} platform in an end-to-end and modular way, to enable easy extensions to other simulations and ML-based surrogate methods. Our results and deliverable provide a promising approach to massively accelerate coastal dynamics simulators, which can enable scientists to efficiently execute many simulations for decision-making, uncertainty quantification, and other research activities.
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
Behind the Scenes: On the Relationship Between Developer Experience and Refactoring
Authors:
Eman Abdullah AlOmar,
Anthony Peruma,
Mohamed Wiem Mkaouer,
Christian D. Newman,
Ali Ouni
Abstract:
Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices or co** with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system's design and disposing of leadership roles in…
▽ More
Refactoring is widely recognized as one of the efficient techniques to manage technical debt and maintain a healthy software project through enforcing best design practices or co** with design defects. Previous refactoring surveys have shown that code refactoring activities are mainly executed by developers who have sufficient knowledge of the system's design and disposing of leadership roles in their development teams. However, these surveys were mainly limited to specific projects and companies. In this paper, we explore the generalizability of the previous results by analyzing 800 open-source projects. We mine their refactoring activities, and we identify their corresponding contributors. Then, we associate an experience score to each contributor in order to test various hypotheses related to whether developers with higher scores tend to 1) perform a higher number of refactoring operations 2) exhibit different motivations behind their refactoring, and 3) better document their refactoring activity. We found that (1) although refactoring is not restricted to a subset of developers, those with higher contribution scores tend to perform more refactorings than others; (2) while there is no correlation between experience and motivation behind refactoring, top contributed developers are found to perform a wider variety of refactoring operations, regardless of their complexity; and (3) top contributed developer tend to document less their refactoring activity. Our qualitative analysis of three randomly sampled projects shows that the developers who are responsible for the majority of refactoring activities are typically in advanced positions in their development teams, demonstrating their extensive knowledge of the design of the systems they contribute to.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data
Authors:
Rupa Kurinchi-Vendhan,
Björn Lütjens,
Ritwik Gupta,
Lucien Werner,
Dava Newman
Abstract:
The transition to green energy grids depends on detailed wind and solar forecasts to optimize the siting and scheduling of renewable energy generation. Operational forecasts from numerical weather prediction models, however, only have a spatial resolution of 10 to 20-km, which leads to sub-optimal usage and development of renewable energy farms. Weather scientists have been develo** super-resolu…
▽ More
The transition to green energy grids depends on detailed wind and solar forecasts to optimize the siting and scheduling of renewable energy generation. Operational forecasts from numerical weather prediction models, however, only have a spatial resolution of 10 to 20-km, which leads to sub-optimal usage and development of renewable energy farms. Weather scientists have been develo** super-resolution methods to increase the resolution, but often rely on simple interpolation techniques or computationally expensive differential equation-based models. Recently, machine learning-based models, specifically the physics-informed resolution-enhancing generative adversarial network (PhIREGAN), have outperformed traditional downscaling methods. We provide a thorough and extensible benchmark of leading deep learning-based super-resolution techniques, including the enhanced super-resolution generative adversarial network (ESRGAN) and an enhanced deep super-resolution (EDSR) network, on wind and solar data. We accompany the benchmark with a novel public, processed, and machine learning-ready dataset for benchmarking super-resolution methods on wind and solar data.
△ Less
Submitted 23 September, 2021; v1 submitted 17 September, 2021;
originally announced September 2021.
-
An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags
Authors:
Christian D. Newman,
Michael J. Decker,
Reem S. AlSuhaibani,
Anthony Peruma,
Satyajit Mohapatra,
Tejal Vishnoi,
Marcos Zampieri,
Mohamed W. Mkaouer,
Timothy J. Sheldon,
Emily Hill
Abstract:
This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE…
▽ More
This paper presents an ensemble part-of-speech tagging approach for source code identifiers. Ensemble tagging is a technique that uses machine-learning and the output from multiple part-of-speech taggers to annotate natural language text at a higher quality than the part-of-speech taggers are able to obtain independently. Our ensemble uses three state-of-the-art part-of-speech taggers: SWUM, POSSE, and Stanford. We study the quality of the ensemble's annotations on five different types of identifier names: function, class, attribute, parameter, and declaration statement at the level of both individual words and full identifier names. We also study and discuss the weaknesses of our tagger to promote the future amelioration of these problems through further research. Our results show that the ensemble achieves 75\% accuracy at the identifier level and 84-86\% accuracy at the word level. This is an increase of +17\% points at the identifier level from the closest independent part-of-speech tagger.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Unified theory of elementary fermions and their interactions based on Clifford algebras
Authors:
Douglas Newman
Abstract:
Seven commuting elements of the Clifford algebra $Cl_{7,7}$ define seven binary eigenvalues that distinguish the $2^7=128$ states of 32 fermions, and determine their parity, electric charge and interactions. Three commuting elements of the sub-algebra $Cl_{3,3}$ define three binary quantum numbers that distinguish the eight states of lepton doublets. The Dirac equation is reformulated in terms of…
▽ More
Seven commuting elements of the Clifford algebra $Cl_{7,7}$ define seven binary eigenvalues that distinguish the $2^7=128$ states of 32 fermions, and determine their parity, electric charge and interactions. Three commuting elements of the sub-algebra $Cl_{3,3}$ define three binary quantum numbers that distinguish the eight states of lepton doublets. The Dirac equation is reformulated in terms of a Lorentz invariant operator which expresses the properties of these states in terms of Dirac 4-component spinors. Re-formulation of the Standard Model shows chiral symmetry breaking to be redundant. A $Cl_{3,3}$ sub-algebra of $Cl_{5,5}$ defines two additional binary quantum numbers that distinguish quarks and leptons, and describes the SU(3) gluons that produce the hadron substrate, explaining quark confinement. Finally, a $Cl_{3,3}$ sub-algebra of $Cl_{7,7}$ defines a further two binary quantum numbers that distinguish four fermion generations. The predicted fourth generation is shown to have no neutrino and a distinct substrate, suggesting that ordinary matter is confined and providing candidates for unconfined dark matter. Interactions between fermions in the first three generations are predicted, including those that produce flavour symmetry. Relationships are explored between the $Cl_{1,3}$ algebra and general relativity, and between $Cl_{5,5}$ and SO(32) string theory.
△ Less
Submitted 13 June, 2023; v1 submitted 10 August, 2021;
originally announced August 2021.
-
IDEAL: An Open-Source Identifier Name Appraisal Tool
Authors:
Anthony Peruma,
Venera Arnaoudova,
Christian D. Newman
Abstract:
Developers must comprehend the code they will maintain, meaning that the code must be legible and reasonably self-descriptive. Unfortunately, there is still a lack of research and tooling that supports developers in understanding their naming practices; whether the names they choose make sense, whether they are consistent, and whether they convey the information required of them. In this paper, we…
▽ More
Developers must comprehend the code they will maintain, meaning that the code must be legible and reasonably self-descriptive. Unfortunately, there is still a lack of research and tooling that supports developers in understanding their naming practices; whether the names they choose make sense, whether they are consistent, and whether they convey the information required of them. In this paper, we present IDEAL, a tool that will provide feedback to developers about their identifier naming practices. Among its planned features, it will support linguistic anti-pattern detection, which is what will be discussed in this paper. IDEAL is designed to, and will, be extended to cover further anti-patterns, naming structures, and practices in the near future. IDEAL is open-source and publicly available, with a demo video available at: https://youtu.be/fVoOYGe50zg
△ Less
Submitted 17 July, 2021;
originally announced July 2021.
-
PCE-PINNs: Physics-Informed Neural Networks for Uncertainty Propagation in Ocean Modeling
Authors:
Björn Lütjens,
Catherine H. Crawford,
Mark Veillette,
Dava Newman
Abstract:
Climate models project an uncertainty range of possible warming scenarios from 1.5 to 5 degree Celsius global temperature increase until 2100, according to the CMIP6 model ensemble. Climate risk management and infrastructure adaptation requires the accurate quantification of the uncertainties at the local level. Ensembles of high-resolution climate models could accurately quantify the uncertaintie…
▽ More
Climate models project an uncertainty range of possible warming scenarios from 1.5 to 5 degree Celsius global temperature increase until 2100, according to the CMIP6 model ensemble. Climate risk management and infrastructure adaptation requires the accurate quantification of the uncertainties at the local level. Ensembles of high-resolution climate models could accurately quantify the uncertainties, but most physics-based climate models are computationally too expensive to run as ensemble. Recent works in physics-informed neural networks (PINNs) have combined deep learning and the physical sciences to learn up to 15k faster copies of climate submodels. However, the application of PINNs in climate modeling has so far been mostly limited to deterministic models. We leverage a novel method that combines polynomial chaos expansion (PCE), a classic technique for uncertainty propagation, with PINNs. The PCE-PINNs learn a fast surrogate model that is demonstrated for uncertainty propagation of known parameter uncertainties. We showcase the effectiveness in ocean modeling by using the local advection-diffusion equation.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Test Smell Detection Tools: A Systematic Map** Study
Authors:
Wajdi Aljedaani,
Anthony Peruma,
Ahmed Aljohani,
Mazen Alotaibi,
Mohamed Wiem Mkaouer,
Ali Ouni,
Christian D. Newman,
Abdullatif Ghallab,
Stephanie Ludi
Abstract:
Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of thes…
▽ More
Test smells are defined as sub-optimal design choices developers make when implementing test cases. Hence, similar to code smells, the research community has produced numerous test smell detection tools to investigate the impact of test smells on the quality and maintenance of test suites. However, little is known about the characteristics, type of smells, target language, and availability of these published tools. In this paper, we provide a detailed catalog of all known, peer-reviewed, test smell detection tools.
We start with performing a comprehensive search of peer-reviewed scientific publications to construct a catalog of 22 tools. Then, we perform a comparative analysis to identify the smell types detected by each tool and other salient features that include programming language, testing framework support, detection strategy, and adoption, among others. From our findings, we discover tools that detect test smells in Java, Scala, Smalltalk, and C++ test suites, with Java support favored by most tools. These tools are available as command-line and IDE plugins, among others. Our analysis also shows that most tools overlap in detecting specific smell types, such as General Fixture. Further, we encounter four types of techniques these tools utilize to detect smells. We envision our study as a one-stop source for researchers and practitioners in determining the tool appropriate for their needs. Our findings also empower the community with information to guide future tool development.
△ Less
Submitted 3 May, 2021; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Physically-Consistent Generative Adversarial Networks for Coastal Flood Visualization
Authors:
Björn Lütjens,
Brandon Leshchinskiy,
Christian Requena-Mesa,
Farrukh Chishtie,
Natalia Díaz-Rodríguez,
Océane Boulais,
Aruna Sankaranarayanan,
Margaux Masson-Forsythe,
Aaron Piña,
Yarin Gal,
Chedy Raïssi,
Alexander Lavin,
Dava Newman
Abstract:
As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, and better tools for flood risk communication could increase the support for flood-resilient infrastructure development. Our work aims to enable more visual communication of large-scale climate impacts via visualizing the output of…
▽ More
As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, and better tools for flood risk communication could increase the support for flood-resilient infrastructure development. Our work aims to enable more visual communication of large-scale climate impacts via visualizing the output of coastal flood models as satellite imagery. We propose the first deep learning pipeline to ensure physical-consistency in synthetic visual satellite imagery. We advanced a state-of-the-art GAN called pix2pixHD, such that it produces imagery that is physically-consistent with the output of an expert-validated storm surge model (NOAA SLOSH). By evaluating the imagery relative to physics-based flood maps, we find that our proposed framework outperforms baseline models in both physical-consistency and photorealism. We envision our work to be the first step towards a global visualization of how the climate challenge will shape our landscape. Continuing on this path, we show that the proposed pipeline generalizes to visualize reforestation. We also publish a dataset of over 25k labelled image-triplets to study image-to-image translation in Earth observation.
△ Less
Submitted 21 February, 2023; v1 submitted 10 April, 2021;
originally announced April 2021.
-
On the Distribution of "Simple Stupid Bugs" in Unit Test Files: An Exploratory Study
Authors:
Anthony Peruma,
Christian D. Newman
Abstract:
A key aspect of ensuring the quality of a software system is the practice of unit testing. Through unit tests, developers verify the correctness of production source code, thereby verifying the system's intended behavior under test. However, unit test code is subject to issues, ranging from bugs in the code to poor test case design (i.e., test smells). In this study, we compare and contrast the oc…
▽ More
A key aspect of ensuring the quality of a software system is the practice of unit testing. Through unit tests, developers verify the correctness of production source code, thereby verifying the system's intended behavior under test. However, unit test code is subject to issues, ranging from bugs in the code to poor test case design (i.e., test smells). In this study, we compare and contrast the occurrences of a type of single-statement-bug-fix known as "simple stupid bugs" (SStuBs) in test and non-test (i.e., production) files in popular open-source Java Maven projects. Our results show that SStuBs occur more frequently in non-test files than in test files, with most fix-related code associated with assertion statements in test files. Further, most test files exhibiting SStuBs also exhibit test smells. We envision our findings enabling tool vendors to better support developers in improving the maintenance of test suites.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
Using Grammar Patterns to Interpret Test Method Name Evolution
Authors:
Anthony Peruma,
Emily Hu,
Jiajun Chen,
Eman Abdullah Alomar,
Mohamed Wiem Mkaouer,
Christian D. Newman
Abstract:
It is good practice to name test methods such that they are comprehensible to developers; they must be written in such a way that their purpose and functionality are clear to those who will maintain them. Unfortunately, there is little automated support for writing or maintaining the names of test methods. This can lead to inconsistent and low-quality test names and increase the maintenance cost o…
▽ More
It is good practice to name test methods such that they are comprehensible to developers; they must be written in such a way that their purpose and functionality are clear to those who will maintain them. Unfortunately, there is little automated support for writing or maintaining the names of test methods. This can lead to inconsistent and low-quality test names and increase the maintenance cost of supporting these methods. Due to this risk, it is essential to help developers in maintaining their test method names over time. In this paper, we use grammar patterns, and how they relate to test method behavior, to understand test naming practices. This data will be used to support an automated tool for maintaining test names.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
On the Naming of Methods: A Survey of Professional Developers
Authors:
Reem S. AlSuhaibani,
Christian D. Newman,
Michael J. Decker,
Michael L. Collard,
Jonathan I. Maletic
Abstract:
This paper describes the results of a large (+1100 responses) survey of professional software developers concerning standards for naming source code methods. The various standards for source code method names are derived from and supported in the software engineering literature. The goal of the survey is to determine if there is a general consensus among developers that the standards are accepted…
▽ More
This paper describes the results of a large (+1100 responses) survey of professional software developers concerning standards for naming source code methods. The various standards for source code method names are derived from and supported in the software engineering literature. The goal of the survey is to determine if there is a general consensus among developers that the standards are accepted and used in practice. Additionally, the paper examines factors such as years of experience and programming language knowledge in the context of survey responses. The survey results show that participants very much agree about the importance of various standards and how they apply to names. Additionally, the survey shows that years of experience and the programming language the participants use has almost no effect on their responses.
△ Less
Submitted 26 February, 2021;
originally announced February 2021.
-
Multi-Beam Energy Moments of Compound Measured Ion Velocity Distributions
Authors:
Martin V. Goldman,
David. L. Newman,
Jonathan P. Eastwood,
Giovanni Lapenta,
James L. Burch,
Barbara Giles
Abstract:
Compound ion distributions, fi(v), have been measured by NASA's Magnetospheric Multi-Scale Mission (MMS) and have been found in reconnection simulations. A complex distribution, fi(v), consisting, for example, of essentially disjoint pieces will be called a multi-beam distribution and modeled as a sum of "beams," fi(v) = f1(v) + ... +fN(v). Velocity moments of fi(v) are taken beam by beam and summ…
▽ More
Compound ion distributions, fi(v), have been measured by NASA's Magnetospheric Multi-Scale Mission (MMS) and have been found in reconnection simulations. A complex distribution, fi(v), consisting, for example, of essentially disjoint pieces will be called a multi-beam distribution and modeled as a sum of "beams," fi(v) = f1(v) + ... +fN(v). Velocity moments of fi(v) are taken beam by beam and summed. Such multi-beam moments of fi(v) have advantages over the customary standard velocity moments of fi(v), forwhich there is only one mean flow velocity. For example, the standard thermal energy momentof a pair of equal and opposite cold particle beams is non-zero even though each beam has zero thermal energy. We therefore call this thermal energy pseudo-thermal. By contrast, a multi-beam moment of two or more beams has no pseudo-thermal energy. We develop three different ways of decomposing into a sum and finding multi-beam moments for both a multi-beam fi(v) measured by MMS in the dayside magnetosphere during reconnection and a multi-beam fi(v) found in a PIC simulation of magnetotail reconnection. The three methods are: A visual method in which the velocity centroid of each beam is estimated and its density determined self-consistently; A k-means method in which particles in a particle-representation of fi(v) are sorted into a minimum energy configuration of N (= k) clusters; A nonlinear least squares method based on a fit to a sum of N kappa functions.
Multi-beam energy moments are calculated and compared with standard moments for the thermal energy density, pressure tensor, thermal energy flux (heat plus enthalpy fluxes), bulk kinetic energy density, RAM pressure and bulk kinetic energy flux. Applying this new formalism to real data demonstrates in detail how multi-beam techniques may provide significant insight into the properties of observed space plasmas.
△ Less
Submitted 11 July, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Technology Readiness Levels for Machine Learning Systems
Authors:
Alexander Lavin,
Ciarán M. Gilligan-Lee,
Alessya Visnjic,
Siddha Ganju,
Dava Newman,
Atılım Güneş Baydin,
Sujoy Ganguly,
Danny Lange,
Amit Sharma,
Stephan Zheng,
Eric P. Xing,
Adam Gibson,
James Parr,
Chris Mattmann,
Yarin Gal
Abstract:
The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards t…
▽ More
The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. Drawing on experience in both spacecraft engineering and ML (from research through product across domain areas), we have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" (MLTRL) framework defines a principled process to ensure robust, reliable, and responsible systems while being streamlined for ML workflows, including key distinctions from traditional software engineering. Even more, MLTRL defines a lingua franca for people across teams and organizations to work collaboratively on artificial intelligence and machine learning technologies. Here we describe the framework and elucidate it with several real world use-cases of develo** ML methods from basic research through productization and deployment, in areas such as medical diagnostics, consumer computer vision, satellite imagery, and particle physics.
△ Less
Submitted 29 November, 2021; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Physics-informed GANs for Coastal Flood Visualization
Authors:
Björn Lütjens,
Brandon Leshchinskiy,
Christian Requena-Mesa,
Farrukh Chishtie,
Natalia Díaz-Rodriguez,
Océane Boulais,
Aaron Piña,
Dava Newman,
Alexander Lavin,
Yarin Gal,
Chedy Raïssi
Abstract:
As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, but during hurricanes the area is largely covered by clouds and emergency managers must rely on nonintuitive flood visualizations for mission planning. To assist these emergency managers, we have created a deep learning pipeline tha…
▽ More
As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, but during hurricanes the area is largely covered by clouds and emergency managers must rely on nonintuitive flood visualizations for mission planning. To assist these emergency managers, we have created a deep learning pipeline that generates visual satellite images of current and future coastal flooding. We advanced a state-of-the-art GAN called pix2pixHD, such that it produces imagery that is physically-consistent with the output of an expert-validated storm surge model (NOAA SLOSH). By evaluating the imagery relative to physics-based flood maps, we find that our proposed framework outperforms baseline models in both physical-consistency and photorealism. While this work focused on the visualization of coastal floods, we envision the creation of a global visualization of how climate change will shape our earth.
△ Less
Submitted 12 February, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
On the Generation, Structure, and Semantics of Grammar Patterns in Source Code Identifiers
Authors:
Christian D. Newman,
Reem S. AlSuhaibani,
Michael J. Decker,
Anthony Peruma,
Dishant Kaushik,
Mohamed Wiem Mkaouer,
Emily Hill
Abstract:
Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in identifier naming practices and how accurately we are able to automatically model these patterns is vital if researchers are to support developers and automated a…
▽ More
Identifiers make up a majority of the text in code. They are one of the most basic mediums through which developers describe the code they create and understand the code that others create. Therefore, understanding the patterns latent in identifier naming practices and how accurately we are able to automatically model these patterns is vital if researchers are to support developers and automated analysis approaches in comprehending and creating identifiers correctly and optimally. This paper investigates identifiers by studying sequences of part-of-speech annotations, referred to as grammar patterns. This work advances our understanding of these patterns and our ability to model them by 1) establishing common naming patterns in different types of identifiers, such as class and attribute names; 2) analyzing how different patterns influence comprehension; and 3) studying the accuracy of state-of-the-art techniques for part-of-speech annotations, which are vital in automatically modeling identifier naming patterns, in order to establish their limits and paths toward improvement. To do this, we manually annotate a dataset of 1,335 identifiers from 20 open-source systems and use this dataset to study naming patterns, semantics, and tagger accuracy.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Multi-beam Energy Moments of Multibeam Particle Velocity Distributions
Authors:
M. V. Goldman,
D. L. Newman,
J. P. Eastwood,
G. Lapenta
Abstract:
High resolution electron and ion velocity distributions, f(v), which consist of N effectively disjoint beams, have been measured by NASA's Magnetospheric Multi-Scale Mission (MMS) observatories and in reconnection simulations. Commonly used standard velocity moments generally assume a single mean-flow-velocity for the entire distribution, which can lead to counterintuitive results for a multibeam…
▽ More
High resolution electron and ion velocity distributions, f(v), which consist of N effectively disjoint beams, have been measured by NASA's Magnetospheric Multi-Scale Mission (MMS) observatories and in reconnection simulations. Commonly used standard velocity moments generally assume a single mean-flow-velocity for the entire distribution, which can lead to counterintuitive results for a multibeam f(v). An example is the (false) standard thermal energy moment of a pair of equal and opposite cold particle beams, which is nonzero even though each beam has zero thermal energy. By contrast, a multibeam moment of two or more beams has no false thermal energy. A multibeam moment is obtained by taking a standard moment of each beam and then summing over beams. In this paper we will generalize these notions, explore their consequences and apply them to an f(v) which is sum of tri-Maxwellians. Both standard and multibeam energy moments have coherent and incoherent forms. Examples of incoherent moments are the thermal energy density, the pressure and the thermal energy flux (enthalpy flux plus heat flux). Corresponding coherent moments are the bulk kinetic energy density, the RAM pressure and the bulk kinetic energy flux. The false part of an incoherent moment is defined as the difference between the standard incoherent moment and the corresponding multibeam moment. The sum of a pair of corresponding coherent and incoherent moments will be called the undecomposed moment. Undecomposed moments are independent of whether the sum is standard or multibeam and therefore have advantages when studying moments of measured f(v).
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Ontology-based Interpretable Machine Learning for Textual Data
Authors:
Phung Lai,
NhatHai Phan,
Han Hu,
Anuja Badeti,
David Newman,
De**g Dou
Abstract:
In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers contextual correlation among words, described in domain knowledge ontologies, to generate semantic explanations. To narrow down the search space for explanations…
▽ More
In this paper, we introduce a novel interpreting framework that learns an interpretable model based on an ontology-based sampling technique to explain agnostic prediction models. Different from existing approaches, our algorithm considers contextual correlation among words, described in domain knowledge ontologies, to generate semantic explanations. To narrow down the search space for explanations, which is a major problem of long and complicated text data, we design a learnable anchor algorithm, to better extract explanations locally. A set of regulations is further introduced, regarding combining learned interpretable representations with anchors to generate comprehensible semantic explanations. An extensive experiment conducted on two real-world datasets shows that our approach generates more precise and insightful explanations compared with baseline approaches.
△ Less
Submitted 31 March, 2020;
originally announced April 2020.
-
Exciting, Useful, Worrying, Futuristic: Public Perception of Artificial Intelligence in 8 Countries
Authors:
Patrick Gage Kelley,
Yongwei Yang,
Courtney Heldreth,
Christopher Moessner,
Aaron Sedley,
Andreas Kramm,
David T. Newman,
Allison Woodruff
Abstract:
As the influence and use of artificial intelligence (AI) have grown and its transformative potential has become more apparent, many questions have been raised regarding the economic, political, social, and ethical implications of its use. Public opinion plays an important role in these discussions, influencing product adoption, commercial development, research funding, and regulation. In this pape…
▽ More
As the influence and use of artificial intelligence (AI) have grown and its transformative potential has become more apparent, many questions have been raised regarding the economic, political, social, and ethical implications of its use. Public opinion plays an important role in these discussions, influencing product adoption, commercial development, research funding, and regulation. In this paper we present results of an in-depth survey of public opinion of artificial intelligence conducted with 10,005 respondents spanning eight countries and six continents. We report widespread perception that AI will have significant impact on society, accompanied by strong support for the responsible development and use of AI, and also characterize the public's sentiment towards AI with four key themes (exciting, useful, worrying, and futuristic) whose prevalence distinguishes response to AI in different countries.
△ Less
Submitted 18 May, 2021; v1 submitted 27 December, 2019;
originally announced January 2020.
-
Coherent transfer of spin angular momentum by evanescent spin waves within antiferromagnetic NiO
Authors:
Maciej Dabrowski,
Takafumi Nakano,
David M. Burn,
Andreas Frisk,
David G. Newman,
Christoph Klewe,
Qian Li,
Mengmeng Yang,
Padraic Shafer,
Elke Arenholz,
Thorsten Hesjedal,
Gerrit van der Laan,
Zi Q. Qiu,
Robert J. Hicken
Abstract:
Insulating antiferromagnets are efficient and robust conductors of spin current. To realise the full potential of these materials within spintronics, the outstanding challenges are to demonstrate scalability down to nanometric lengthscales and the transmission of coherent spin currents. Here, we report the coherent transfer of spin angular momentum by excitation of evanescent spin waves of GHz fre…
▽ More
Insulating antiferromagnets are efficient and robust conductors of spin current. To realise the full potential of these materials within spintronics, the outstanding challenges are to demonstrate scalability down to nanometric lengthscales and the transmission of coherent spin currents. Here, we report the coherent transfer of spin angular momentum by excitation of evanescent spin waves of GHz frequency within antiferromagnetic NiO at room temperature. Using element-specific and phase-resolved x-ray ferromagnetic resonance, we probe the injection and transmission of ac spin current, and demonstrate that insertion of a few nanometre thick epitaxial NiO(001) layer between a ferromagnet and non-magnet can even enhance the flow of spin current. Our results pave the way towards coherent control of the phase and amplitude of spin currents at the nanoscale, and enable the realization of spin-logic devices and spin current amplifiers that operate at GHz and THz frequencies.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Local regimes of turbulence in 3D magnetic reconnection
Authors:
G. Lapenta,
F. Pucci,
M. V. Goldman,
D. L. Newman
Abstract:
The process of magnetic reconnection when studied in Nature or when modeled in 3D simulations differs in one key way from the standard 2D paradigmatic cartoon: it is accompanied by much fluctuations in the electromagnetic fields and plasma properties. We developed a diagnostics to study the spectrum of fluctuations in the various regions around a reconnection site. We define the regions in terms o…
▽ More
The process of magnetic reconnection when studied in Nature or when modeled in 3D simulations differs in one key way from the standard 2D paradigmatic cartoon: it is accompanied by much fluctuations in the electromagnetic fields and plasma properties. We developed a diagnostics to study the spectrum of fluctuations in the various regions around a reconnection site. We define the regions in terms of the local value of the flux function that determines the distance form the reconnection site, with positive values in the outflow and negative values in the inflow. We find that fluctuations belong to two very different regimes depending on the local plasma beta (defined as the ratio of plasma and magnetic pressure). The first regime develops in the reconnection outflows where beta is high and is characterized by a strong link between plasma and electromagnetic fluctuations leading to momentum and energy exchanges via anomalous viscosity and resistivity. But there is a second, low beta regime: it develops in the inflow and in the region around the separatrix surfaces, including the reconnection electron diffusion region itself. It is remarkable that this low beta plasma, where the magnetic pressure dominates, remain laminar even though the electromagnetic fields are turbulent.
△ Less
Submitted 26 October, 2019;
originally announced October 2019.
-
Characterizing magnetic reconnection regions using Gaussian mixture models on particle velocity distributions
Authors:
Romain Dupuis,
Martin V. Goldman,
David L. Newman,
Jorge Amaya,
Giovanni Lapenta
Abstract:
We present a method based on unsupervised machine learning to identify regions of interest using particle velocity distributions as a signature pattern. An automatic density estimation technique is applied to particle distributions provided by PIC simulations to study magnetic reconnection. The key components of the method involve: i) a Gaussian mixture model determining the presence of a given nu…
▽ More
We present a method based on unsupervised machine learning to identify regions of interest using particle velocity distributions as a signature pattern. An automatic density estimation technique is applied to particle distributions provided by PIC simulations to study magnetic reconnection. The key components of the method involve: i) a Gaussian mixture model determining the presence of a given number of subpopulations within an overall population, and ii) a model selection technique with Bayesian Information Criterion to estimate the appropriate number of subpopulations. Thus, this method identifies automatically the presence of complex distributions, such as beams or other non-Maxwellian features, and can be used as a detection algorithm able to identify reconnection regions. The approach is demonstrated for specific double Harris sheet simulations but it can in principle be applied to any other type of simulation and observational data on the particle distribution function.
△ Less
Submitted 22 October, 2019;
originally announced October 2019.
-
Quantum limit to nonequilibrium heat-engine performance imposed by strong system-reservoir coupling
Authors:
David Newman,
Florian Mintert,
Ahsan Nazir
Abstract:
We show that finite system-reservoir coupling imposes a distinct quantum limit on the performance of a non-equilibrium quantum heat engine. Even in the absence of quantum friction along the isentropic strokes, finite system-reservoir coupling induces correlations that result in the generation of coherence between the energy eigenstates of the working system. This coherence acts to hamper the engin…
▽ More
We show that finite system-reservoir coupling imposes a distinct quantum limit on the performance of a non-equilibrium quantum heat engine. Even in the absence of quantum friction along the isentropic strokes, finite system-reservoir coupling induces correlations that result in the generation of coherence between the energy eigenstates of the working system. This coherence acts to hamper the engine's power output, as well as the efficiency with which it can convert heat into useful work, and cannot be captured by a standard Born-Markov analysis of the system-reservoir interactions.
△ Less
Submitted 20 May, 2020; v1 submitted 21 June, 2019;
originally announced June 2019.
-
A violin sonata for reconnection
Authors:
G. Lapenta,
F. Pucci,
M. V. Goldman,
D. L. Newman
Abstract:
The process of magnetic reconnection when studied in Nature or when modeled in 3D simulations differs in one key way from the standard 2D paradigmatic cartoon: it is accompanied by much fluctuations in the electromagnetic fields and plasma properties. We developed a new diagnostics, the topographical fluctuations analysis (TFA) to study the spectrum of fluctuations in the various regions around a…
▽ More
The process of magnetic reconnection when studied in Nature or when modeled in 3D simulations differs in one key way from the standard 2D paradigmatic cartoon: it is accompanied by much fluctuations in the electromagnetic fields and plasma properties. We developed a new diagnostics, the topographical fluctuations analysis (TFA) to study the spectrum of fluctuations in the various regions around a reconnection site. We find that fluctuations belong to two very different regimes. The first regime is better known, it develops in the reconnection outflows and is characterized by a strong link between plasma and electromagnetic fluctuations leading to momentum and energy exchanges via anomalous viscosity and resistivity. But there is a second, new, regime: it develops in the inflow and in the region around the separatrix surfaces, including the reconnection diffusion region itself. In this new regime the plasma remains laminar but the electromagnetic fields fluctuates strongly. We present an analogy with the smooth continuous motion of the bow of a violin producing the vibrations of the strings to emit music.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Properties of turbulence in the reconnection exhaust: numerical simulations compared with observations
Authors:
F. Pucci,
S. Servidio,
L. Sorriso-Valvo,
V. Olshevsky,
W. H. Matthaeus,
F. Malara,
M. V. Goldman,
D. L. Newman,
G. Lapenta
Abstract:
The properties of the turbulence which develops in the outflows of magnetic reconnection have been investigated using self-consistent plasma simulations, in three dimensions. As commonly observed in space plasmas, magnetic reconnection is characterized by the presence of turbulence. Here we provide a direct comparison of our simulations with reported observations of reconnection events in the magn…
▽ More
The properties of the turbulence which develops in the outflows of magnetic reconnection have been investigated using self-consistent plasma simulations, in three dimensions. As commonly observed in space plasmas, magnetic reconnection is characterized by the presence of turbulence. Here we provide a direct comparison of our simulations with reported observations of reconnection events in the magnetotail investigating the properties of the electromagnetic field and the energy conversion mechanisms. In particular, simulations show the development of a turbulent cascade consistent with spacecraft observations, statistics of the the dissipation mechanisms in the turbulent outflows similar to the one observed in reconnection jets in the magnetotail, and that the properties of turbulence vary as a function of the distance from the reconnecting X-line.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
Generation of turbulence in colliding reconnection jets
Authors:
F. Pucci,
W. H. Matthaeus,
A. Chasapis,
S. Servidio,
L. Sorriso-Valvo,
V. Olshevsky,
D. L. Newman,
M. V. Goldman,
G. Lapenta
Abstract:
The collision of magnetic reconnection jets is studied by means of a three dimensional numerical simulation at kinetic scale, in the presence of a strong guide field. We show that turbulence develops due to the jets collision producing several current sheets in reconnection outflows, aligned with the guide field direction. The turbulence is mainly two-dimensional, with stronger gradients in the pl…
▽ More
The collision of magnetic reconnection jets is studied by means of a three dimensional numerical simulation at kinetic scale, in the presence of a strong guide field. We show that turbulence develops due to the jets collision producing several current sheets in reconnection outflows, aligned with the guide field direction. The turbulence is mainly two-dimensional, with stronger gradients in the plane perpendicular to the guide field and a low wave-like activity in the parallel direction. First, we provide a numerical method to isolate the central turbulent region. Second, we analyze spatial second-order structure function and prove that turbulence is confined in this region. Finally, we compute local magnetic and electric frequency spectra, finding a trend in the sub-ion range that differs from typical cases for which the Taylor hypothesis is valid, as well as wave activity in the range between ion and electron cyclotron frequencies. Our results are relevant to understand observations of reconnection jets collisions in space plasmas.
△ Less
Submitted 31 October, 2018;
originally announced October 2018.
-
On the Brain Networks of Complex Problem Solving
Authors:
Abdullah Alchihabi,
Omer Ekmekci,
Baran B. Kivilcim,
Sharlene D. Newman,
Fatos T. Yarman Vural
Abstract:
Complex problem solving is a high level cognitive process which has been thoroughly studied over the last decade. The Tower of London (TOL) is a task that has been widely used to study problem-solving. In this study, we aim to explore the underlying cognitive network dynamics among anatomical regions of complex problem solving and its sub-phases, namely planning and execution. A new brain network…
▽ More
Complex problem solving is a high level cognitive process which has been thoroughly studied over the last decade. The Tower of London (TOL) is a task that has been widely used to study problem-solving. In this study, we aim to explore the underlying cognitive network dynamics among anatomical regions of complex problem solving and its sub-phases, namely planning and execution. A new brain network construction model establishing dynamic functional brain networks using fMRI is proposed. The first step of the model is a preprocessing pipeline that manages to decrease the spatial redundancy while increasing the temporal resolution of the fMRI recordings. Then, dynamic brain networks are estimated using artificial neural networks. The network properties of the estimated brain networks are studied in order to identify regions of interest, such as hubs and subgroups of densely connected brain regions. The major similarities and dissimilarities of the network structure of planning and execution phases are highlighted. Our findings show the hubs and clusters of densely interconnected regions during both subtasks. It is observed that there are more hubs during the planning phase compared to the execution phase, and the clusters are more strongly connected during planning compared to execution.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Non-linear Waves and Instabilities Leading to Secondary Reconnection in Reconnection Outflows
Authors:
Giovanni Lapenta,
Francesco Pucci,
Vyacheslav Olshevsky,
Sergio Servidio,
Luca Sorriso-Valvo,
David L. Newman,
Martin Goldman
Abstract:
Reconnection outflows are regions of intense recent scrutiny, from in situ observations and from simulations. These regions are host to a variety of instabilities and intense energy exchanges, often even superior to the main reconnection site. We report here a number of results drawn from investigation of simulations. First, the outflows are observed to become unstable to drift instabilities. Seco…
▽ More
Reconnection outflows are regions of intense recent scrutiny, from in situ observations and from simulations. These regions are host to a variety of instabilities and intense energy exchanges, often even superior to the main reconnection site. We report here a number of results drawn from investigation of simulations. First, the outflows are observed to become unstable to drift instabilities. Second, these instabilities lead to the formation of secondary reconnection sites. Third, the secondary processes are responsible for large energy exchanges and particle energization. Finally, the particle distribution function are modified to become non-Maxwellian and include multiple interpenetrating populations.
△ Less
Submitted 26 August, 2018;
originally announced August 2018.
-
Localized Oscillatory Dissipation in Magnetopause Reconnection
Authors:
J. L. Burch,
R. E. Ergun,
P. A. Cassak,
J. M. Webster,
R. B. Torbert,
B. L. Giles,
J. C. Dorelli,
A. C. Rager,
K. -J. Hwang,
T. D. Phan,
K. J. Genestreti,
R. C. Allen,
L. -J. Chen,
S. Wang,
D. Gershman,
O. Le Contel,
C. T. Russell,
R. J. Strangeway,
F. D. Wilder,
D. B. Graham,
M. Hesse,
J. F. Drake,
M. Swisdak,
L. M. Price,
M. A. Shay
, et al. (4 additional authors not shown)
Abstract:
Data from the NASA Magnetospheric Multiscale (MMS) mission are used to investigate asymmetric magnetic reconnection at the dayside boundary between the Earth's magnetosphere and the solar wind (the magnetopause). High-resolution measurements of plasmas, electric and magnetic fields, and waves are used to identify highly localized (~15 electron Debye lengths) standing wave structures with large ele…
▽ More
Data from the NASA Magnetospheric Multiscale (MMS) mission are used to investigate asymmetric magnetic reconnection at the dayside boundary between the Earth's magnetosphere and the solar wind (the magnetopause). High-resolution measurements of plasmas, electric and magnetic fields, and waves are used to identify highly localized (~15 electron Debye lengths) standing wave structures with large electric-field amplitudes (up to 100 mV/m). These wave structures are associated with spatially oscillatory dissipation, which appears as alternatingly positive and negative values of J dot E (dissipation). For small guide magnetic fields the wave structures occur in the electron stagnation region at the magnetosphere edge of the EDR. For larger guide fields the structures also occur near the reconnection x-line. This difference is explained in terms of channels for the out-of-plane current (agyrotropic electrons at the stagnation point and guide-field-aligned electrons at the x-line).
△ Less
Submitted 13 December, 2017;
originally announced December 2017.