Search | arXiv e-print repository

Data Science Education in Undergraduate Physics: Lessons Learned from a Community of Practice

Authors: Karan Shah, Julie Butler, Alexis Knaub, Anıl Zenginoğlu, William Ratcliff, Mohammad Soltanieh-ha

Abstract: It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from differen… ▽ More It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from different institutions and backgrounds to share best practices and lessons learned from integrating data science into undergraduate physics education. In this article we present insights and experiences from this community of practice, highlighting key strategies and challenges in incorporating data science into the introductory physics curriculum. Our goal is to provide guidance and inspiration to educators who seek to integrate data science into their teaching, hel** to prepare the next generation of physicists for a data-driven world. △ Less

Submitted 16 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 21 pages, 4 figures, 2 tables. The associated GItHub repository can be found at https://github.com/GDS-Education-Community-of-Practice/DSECOP

arXiv:2311.00236 [pdf, other]

Objectives and Key Results in Software Teams: Challenges, Opportunities and Impact on Development

Authors: Jenna Butler, Thomas Zimmermann, Christian Bird

Abstract: Building software, like building almost anything, requires people to understand a common goal and work together towards it. In large software companies, a VP or Director will have an idea or goal and it is often the job of middle management to distill that lofty, general idea into manageable, finite units of work. How do organizations do this hard work of setting and measuring progress towards goa… ▽ More Building software, like building almost anything, requires people to understand a common goal and work together towards it. In large software companies, a VP or Director will have an idea or goal and it is often the job of middle management to distill that lofty, general idea into manageable, finite units of work. How do organizations do this hard work of setting and measuring progress towards goals? To understand this question, we undertook a mixed methods approach to studying goal setting, management dissemination of goals, goal tracking and ultimately software delivery at a large multi-national software company. Semi-structured interviews with 47 participants were analyzed and used to develop a survey which was deployed to a multi-national team of over 4,000 engineers. The 512 responses were analyzed using thematic analysis, linear regressions and hypothesis testing, and found that tracking, measuring and setting goals is hard work, regardless of tools used. Middle management seems to be a critical component of the translation of lofty goals to actionable work items. In addition, attitudes and beliefs of engineers are critical to the success of any goal setting framework. Based on this research, we make recommendations on how to improve the goal setting and OKR process in software organizations: invest in the data pipeline, increase transparency, improve communication, promote learning communities, and a structured roll out of OKRs. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: 11 pages, 2 figures

arXiv:2306.14401 [pdf, ps, other]

On the distribution of sensitivities of symmetric Boolean functions

Authors: Jon T. Butler, Tsutomu Sasao, Shinobu Nagayama

Abstract: A Boolean function $f({\vec x})$ is sensitive to bit $x_i$ if there is at least one input vector $\vec x$ and one bit $x_i$ in $\vec x$, such that changing $x_i$ changes $f$. A function has sensitivity $s$ if among all input vectors, the largest number of bits to which $f$ is sensitive is $s$. We count the $n$-variable symmetric Boolean functions that have maximum sensitivity. We show that most su… ▽ More A Boolean function $f({\vec x})$ is sensitive to bit $x_i$ if there is at least one input vector $\vec x$ and one bit $x_i$ in $\vec x$, such that changing $x_i$ changes $f$. A function has sensitivity $s$ if among all input vectors, the largest number of bits to which $f$ is sensitive is $s$. We count the $n$-variable symmetric Boolean functions that have maximum sensitivity. We show that most such functions have the largest possible sensitivity, $n$. This suggests sensitivity is limited as a complexity measure for symmetric Boolean functions. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: 5 pages, 0 figures The submitted paper is a journal version of "Enumeration of Symmetric Boolean Functions By Sensitivity" by J. Butler, T. Sasao, and S. Nagayama presented at the Reed-Muller Workshop, Matsue, Japan on May 24, 2023. Paper was presented, but not distributed. Authors retained copyright

arXiv:2301.03591 [pdf, other]

PatentsView-Evaluation: Evaluation Datasets and Tools to Advance Research on Inventor Name Disambiguation

Authors: Olivier Binette, Sarvo Madhavan, Jack Butler, Beth Anne Card, Emily Melluso, Christina Jones

Abstract: We present PatentsView-Evaluation, a Python package that enables researchers to evaluate the performance of inventor name disambiguation systems such as PatentsView.org. The package includes benchmark datasets and evaluation tools, and aims to advance research on inventor name disambiguation by providing access to high-quality evaluation data and improving evaluation standards. We present PatentsView-Evaluation, a Python package that enables researchers to evaluate the performance of inventor name disambiguation systems such as PatentsView.org. The package includes benchmark datasets and evaluation tools, and aims to advance research on inventor name disambiguation by providing access to high-quality evaluation data and improving evaluation standards. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: 3 pages, 2 figures

arXiv:2211.12983 [pdf, other]

Causal Analysis of the TOPCAT Trial: Spironolactone for Preserved Cardiac Function Heart Failure

Authors: Francesca E. D. Raimondi, Tadhg O'Keeffe, Hana Chockler, Andrew R. Lawrence, Tamara Stemberga, Andre Franca, Maksim Sipos, Javed Butler, Shlomo Ben-Haim

Abstract: We describe the results of applying causal discovery methods on the data from a multi-site clinical trial, on the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT). The trial was inconclusive, with no clear benefits consistently shown for the whole cohort. However, there were questions regarding the reliability of the diagnosis and treatment protocol for… ▽ More We describe the results of applying causal discovery methods on the data from a multi-site clinical trial, on the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT). The trial was inconclusive, with no clear benefits consistently shown for the whole cohort. However, there were questions regarding the reliability of the diagnosis and treatment protocol for a geographic subgroup of the cohort. With the inclusion of medical context in the form of domain knowledge, causal discovery is used to demonstrate regional discrepancies and to frame the regional transportability of the results. Furthermore, we show that, globally and especially for some subgroups, the treatment has significant causal effects, thus offering a more refined view of the trial results. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Journal ref: NeurIPS 2022 Workshop on Causal Machine Learning for Real-World Impact (CML4Impact 2022)

arXiv:2103.02524 [pdf]

Personal Productivity and Well-being -- Chapter 2 of the 2021 New Future of Work Report

Authors: Jenna Butler, Mary Czerwinski, Shamsi Iqbal, Sonia Jaffe, Kate Nowak, Emily Peloquin, Longqi Yang

Abstract: We now turn to understanding the impact that COVID-19 had on the personal productivity and well-being of information workers as their work practices were impacted by remote work. This chapter overviews people's productivity, satisfaction, and work patterns, and shows that the challenges and benefits of remote work are closely linked. Looking forward, the infrastructure surrounding work will need t… ▽ More We now turn to understanding the impact that COVID-19 had on the personal productivity and well-being of information workers as their work practices were impacted by remote work. This chapter overviews people's productivity, satisfaction, and work patterns, and shows that the challenges and benefits of remote work are closely linked. Looking forward, the infrastructure surrounding work will need to evolve to help people adapt to the challenges of remote and hybrid work. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: In The New Future of Work: Research from Microsoft on the Impact of the Pandemic on Work Practices, edited by Jaime Teevan, Brent Hecht, and Sonia Jaffe, 1st ed. Microsoft, 2021. https://aka.ms/newfutureofwork

arXiv:2008.11147 [pdf, other]

doi 10.1145/3487567

A Tale of Two Cities: Software Developers Working from Home During the COVID-19 Pandemic

Authors: Denae Ford, Margaret-Anne Storey, Thomas Zimmermann, Christian Bird, Sonia Jaffe, Chandra Maddila, Jenna L. Butler, Brian Houck, Nachiappan Nagappan

Abstract: The COVID-19 pandemic has shaken the world to its core and has provoked an overnight exodus of developers that normally worked in an office setting to working from home. The magnitude of this shift and the factors that have accompanied this new unplanned work setting go beyond what the software engineering community has previously understood to be remote work. To find out how developers and their… ▽ More The COVID-19 pandemic has shaken the world to its core and has provoked an overnight exodus of developers that normally worked in an office setting to working from home. The magnitude of this shift and the factors that have accompanied this new unplanned work setting go beyond what the software engineering community has previously understood to be remote work. To find out how developers and their productivity were affected, we distributed two surveys (with a combined total of 3,634 responses that answered all required questions) -- weeks apart to understand the presence and prevalence of the benefits, challenges, and opportunities to improve this special circumstance of remote work. From our thematic qualitative analysis and statistical quantitative analysis, we find that there is a dichotomy of developer experiences influenced by many different factors (that for some are a benefit, while for others a challenge). For example, a benefit for some was being close to family members but for others having family members share their working space and interrupting their focus, was a challenge. Our surveys led to powerful narratives from respondents and revealed the scale at which these experiences exist to provide insights as to how the future of (pandemic) remote work can evolve. △ Less

Submitted 10 September, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

Comments: 36 pages, 1 figure, 6 tables

Journal ref: ACM Transactions on Software Engineering and Methodology, Volume 31, Issue 2 (April 2022)

arXiv:1911.03867 [pdf, other]

A Modular Deep Learning Pipeline for Galaxy-Scale Strong Gravitational Lens Detection and Modeling

Authors: Sandeep Madireddy, Nesar Ramachandra, Nan Li, James Butler, Prasanna Balaprakash, Salman Habib, Katrin Heitmann, The LSST Dark Energy Science Collaboration

Abstract: Upcoming large astronomical surveys are expected to capture an unprecedented number of strong gravitational lensing systems. Deep learning is emerging as a promising practical tool for the detection and quantification of these galaxy-scale image distortions. The absence of large quantities of representative data from current astronomical surveys motivates the development of a robust forward-modeli… ▽ More Upcoming large astronomical surveys are expected to capture an unprecedented number of strong gravitational lensing systems. Deep learning is emerging as a promising practical tool for the detection and quantification of these galaxy-scale image distortions. The absence of large quantities of representative data from current astronomical surveys motivates the development of a robust forward-modeling approach using synthetic lensing images. Using a mock sample of strong lenses created upon a state-of-the-art extragalactic catalogs, we train a modular deep learning pipeline for uncertainty-quantified detection and modeling with intermediate image processing components for denoising and deblending the lensing systems. We demonstrate a high degree of interpretability and controlled systematics due to domain-specific task modules trained with different stages of synthetic image generation. For lens detection and modeling, we obtain semantically meaningful latent spaces that separate classes of strong lens images and yield uncertainty estimates that explain the origin of misclassified images and provide probabilistic predictions for the lens parameters. Validation of the inference pipeline has been carried out using images from the Subaru telescope's Hyper Suprime-Cam camera, and LSST DESC simulated DC2 sky survey catalogues. △ Less

Submitted 21 October, 2022; v1 submitted 10 November, 2019; originally announced November 2019.

arXiv:1910.12580 [pdf, other]

Assessing Regulatory Risk in Personal Financial Advice Documents: a Pilot Study

Authors: Wanita Sherchan, Simon Harris, Sue Ann Chen, Nebula Alam, Khoi-Nguyen Tran, Adam J. Makarucha, Christopher J. Butler

Abstract: Assessing regulatory compliance of personal financial advice is currently a complex manual process. In Australia, only 5%- 15% of advice documents are audited annually and 75% of these are found to be non-compliant(ASI 2018b). This paper describes a pilot with an Australian government regulation agency where Artificial Intelligence (AI) models based on techniques such natural language processing (… ▽ More Assessing regulatory compliance of personal financial advice is currently a complex manual process. In Australia, only 5%- 15% of advice documents are audited annually and 75% of these are found to be non-compliant(ASI 2018b). This paper describes a pilot with an Australian government regulation agency where Artificial Intelligence (AI) models based on techniques such natural language processing (NLP), machine learning and deep learning were developed to methodically characterise the regulatory risk status of personal financial advice documents. The solution provides traffic light rating of advice documents for various risk factors enabling comprehensive coverage of documents in the review and allowing rapid identification of documents that are at high risk of non-compliance with government regulations. This pilot serves as a case study of public-private partnership in develo** AI systems for government and public sector. △ Less

Submitted 11 October, 2019; originally announced October 2019.

Comments: Presented at AAAI FSS-19: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA

arXiv:1806.01351 [pdf, other]

Document Chunking and Learning Objective Generation for Instruction Design

Authors: Khoi-Nguyen Tran, Jey Han Lau, Danish Contractor, Utkarsh Gupta, Bikram Sengupta, Christopher J. Butler, Mukesh Mohania

Abstract: Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing. Specifically in designing courses, an hour of training material can require between 30 to 500 hours of effort in sourcing and organizing reference data for use in just the preparation of course material. In this paper, we p… ▽ More Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing. Specifically in designing courses, an hour of training material can require between 30 to 500 hours of effort in sourcing and organizing reference data for use in just the preparation of course material. In this paper, we present the first system of its kind that helps reduce the effort associated with sourcing reference material and course creation. We present algorithms for document chunking and automatic generation of learning objectives from content, creating descriptive content metadata to improve content-discoverability. Unlike existing methods, the learning objectives generated by our system incorporate pedagogically motivated Bloom's verbs. We demonstrate the usefulness of our methods using real world data from the banking industry and through a live deployment at a large pharmaceutical company. △ Less

Submitted 5 August, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: Proceedings of the 11th International Conference on Education Data Mining (EDM 2018)

arXiv:1610.02228 [pdf, other]

Project ACT: Social Media Analytics in Disaster Response

Authors: Wanita Sherchan, Shaila Pervin, Christopher J. Butler, Jennifer C. Lai

Abstract: In large-scale emergencies social media has become a key source of information for public awareness, government authorities and relief agencies. However, the sheer volume of data and the low signal-to- noise ratio limit the effectiveness and the efficiency of using social media as an intelligence resource. We describe Australian Crisis Tracker (ACT), a tool designed for agencies responding to larg… ▽ More In large-scale emergencies social media has become a key source of information for public awareness, government authorities and relief agencies. However, the sheer volume of data and the low signal-to- noise ratio limit the effectiveness and the efficiency of using social media as an intelligence resource. We describe Australian Crisis Tracker (ACT), a tool designed for agencies responding to large- scale emergency events, to facilitate the understanding of critical information in Twitter. ACT was piloted by the Australian Red Cross (ARC) during the 2013-2014 Australian bushfires season. Video is available at: https://www.youtube.com/watch?v=Y-1rtNFqQbE △ Less

Submitted 7 October, 2016; originally announced October 2016.

arXiv:0910.2048 [pdf]

The Role of Spreadsheets in the Allied Irish Bank / Allfirst Currency Trading Fraud

Authors: Raymond J. Butler

Abstract: This brief paper outlines how spreadsheets were used as one of the vehicles for John Rusnak's fraud and the revenue control lessons this case gives us. This brief paper outlines how spreadsheets were used as one of the vehicles for John Rusnak's fraud and the revenue control lessons this case gives us. △ Less

Submitted 11 October, 2009; originally announced October 2009.

Comments: 4 Pages. To Appear Proc. European Spreadsheet Risks Interest Group

arXiv:0805.4236 [pdf]

Risk Assessment For Spreadsheet Developments: Choosing Which Models to Audit

Authors: Raymond J. Butler

Abstract: Errors in spreadsheet applications and models are alarmingly common (some authorities, with justification cite spreadsheets containing errors as the norm rather than the exception). Faced with this body of evidence, the auditor can be faced with a huge task - the temptation may be to launch code inspections for every spreadsheet in an organisation. This can be very expensive and time-consuming.… ▽ More Errors in spreadsheet applications and models are alarmingly common (some authorities, with justification cite spreadsheets containing errors as the norm rather than the exception). Faced with this body of evidence, the auditor can be faced with a huge task - the temptation may be to launch code inspections for every spreadsheet in an organisation. This can be very expensive and time-consuming. This paper describes risk assessment based on the "SpACE" audit methodology used by H M Customs & Excise's tax inspectors. This allows the auditor to target resources on the spreadsheets posing the highest risk of error, and justify the deployment of those resources to managers and clients. Since the opposite of audit risk is audit assurance the paper also offers an overview of some elements of good practice in the use of spreadsheets in business. △ Less

Submitted 27 May, 2008; originally announced May 2008.

Comments: 11 Pages, 1 Figure

ACM Class: D.1.7; D.2.1; D.2.11; D.3.2; D.3.3; H.4.1; K.6.4; K.8.1

Journal ref: Proc. European Spreadsheet Risks Int. Grp. (EuSpRIG) 2000 65-74 ISBN:1 86166 158 4

arXiv:0801.0609 [pdf]

Applying the CobiT Control Framework to Spreadsheet Developments

Authors: Raymond J. Butler

Abstract: One of the problems reported by researchers and auditors in the field of spreadsheet risks is that of getting and kee** managements attention to the problem. Since 1996, the Information Systems Audit & Control Foundation and the IT Governance Institute have published CobiT which brings mainstream IT control issues into the corporate governance arena. This paper illustrates how spreadsheet risk… ▽ More One of the problems reported by researchers and auditors in the field of spreadsheet risks is that of getting and kee** managements attention to the problem. Since 1996, the Information Systems Audit & Control Foundation and the IT Governance Institute have published CobiT which brings mainstream IT control issues into the corporate governance arena. This paper illustrates how spreadsheet risk and control issues can be mapped onto the CobiT framework and thus brought to managers attention in a familiar format. △ Less

Submitted 3 January, 2008; originally announced January 2008.

Comments: 6 Pages

ACM Class: J.1; H.4.1; K.6.4; D.2.9

Journal ref: Proc. European Spreadsheet Risks Int. Grp. 2001 7-13 ISBN:1 86166 179 7

arXiv:0710.0871 [pdf]

Spreadsheets in Clinical Medicine

Authors: Grenville J. Croll, Raymond J. Butler

Abstract: There is overwhelming evidence that the continued and widespread use of untested spreadsheets in business gives rise to regular, significant and unexpected financial losses. Whilst this is worrying, it is perhaps a relatively minor concern compared with the risks arising from the use of poorly constructed and/or untested spreadsheets in medicine, a practice that is already occurring. This articl… ▽ More There is overwhelming evidence that the continued and widespread use of untested spreadsheets in business gives rise to regular, significant and unexpected financial losses. Whilst this is worrying, it is perhaps a relatively minor concern compared with the risks arising from the use of poorly constructed and/or untested spreadsheets in medicine, a practice that is already occurring. This article is intended as a warning that the use of poorly constructed and/or untested spreadsheets in clinical medicine cannot be tolerated. It supports this warning by reporting on potentially serious weaknesses found while testing a limited number of publicly available clinical spreadsheets. △ Less

Submitted 3 October, 2007; originally announced October 2007.

Comments: 10 Pages including references

ACM Class: J.3; K.8.1

Journal ref: Proc. European Spreadsheet Risks Int. Grp. 2006 7-16

Showing 1–15 of 15 results for author: Butler, J