-
Data Science Education in Undergraduate Physics: Lessons Learned from a Community of Practice
Authors:
Karan Shah,
Julie Butler,
Alexis Knaub,
Anıl Zenginoğlu,
William Ratcliff,
Mohammad Soltanieh-ha
Abstract:
It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from differen…
▽ More
It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from different institutions and backgrounds to share best practices and lessons learned from integrating data science into undergraduate physics education. In this article we present insights and experiences from this community of practice, highlighting key strategies and challenges in incorporating data science into the introductory physics curriculum. Our goal is to provide guidance and inspiration to educators who seek to integrate data science into their teaching, hel** to prepare the next generation of physicists for a data-driven world.
△ Less
Submitted 16 June, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Objectives and Key Results in Software Teams: Challenges, Opportunities and Impact on Development
Authors:
Jenna Butler,
Thomas Zimmermann,
Christian Bird
Abstract:
Building software, like building almost anything, requires people to understand a common goal and work together towards it. In large software companies, a VP or Director will have an idea or goal and it is often the job of middle management to distill that lofty, general idea into manageable, finite units of work. How do organizations do this hard work of setting and measuring progress towards goa…
▽ More
Building software, like building almost anything, requires people to understand a common goal and work together towards it. In large software companies, a VP or Director will have an idea or goal and it is often the job of middle management to distill that lofty, general idea into manageable, finite units of work. How do organizations do this hard work of setting and measuring progress towards goals? To understand this question, we undertook a mixed methods approach to studying goal setting, management dissemination of goals, goal tracking and ultimately software delivery at a large multi-national software company.
Semi-structured interviews with 47 participants were analyzed and used to develop a survey which was deployed to a multi-national team of over 4,000 engineers. The 512 responses were analyzed using thematic analysis, linear regressions and hypothesis testing, and found that tracking, measuring and setting goals is hard work, regardless of tools used. Middle management seems to be a critical component of the translation of lofty goals to actionable work items. In addition, attitudes and beliefs of engineers are critical to the success of any goal setting framework. Based on this research, we make recommendations on how to improve the goal setting and OKR process in software organizations: invest in the data pipeline, increase transparency, improve communication, promote learning communities, and a structured roll out of OKRs.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
On the distribution of sensitivities of symmetric Boolean functions
Authors:
Jon T. Butler,
Tsutomu Sasao,
Shinobu Nagayama
Abstract:
A Boolean function $f({\vec x})$ is sensitive to bit $x_i$ if there is at least one input vector $\vec x$ and one bit $x_i$ in $\vec x$, such that changing $x_i$ changes $f$. A function has sensitivity $s$ if among all input vectors, the largest number of bits to which $f$ is sensitive is $s$. We count the $n$-variable symmetric Boolean functions that have maximum sensitivity. We show that most su…
▽ More
A Boolean function $f({\vec x})$ is sensitive to bit $x_i$ if there is at least one input vector $\vec x$ and one bit $x_i$ in $\vec x$, such that changing $x_i$ changes $f$. A function has sensitivity $s$ if among all input vectors, the largest number of bits to which $f$ is sensitive is $s$. We count the $n$-variable symmetric Boolean functions that have maximum sensitivity. We show that most such functions have the largest possible sensitivity, $n$. This suggests sensitivity is limited as a complexity measure for symmetric Boolean functions.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
PatentsView-Evaluation: Evaluation Datasets and Tools to Advance Research on Inventor Name Disambiguation
Authors:
Olivier Binette,
Sarvo Madhavan,
Jack Butler,
Beth Anne Card,
Emily Melluso,
Christina Jones
Abstract:
We present PatentsView-Evaluation, a Python package that enables researchers to evaluate the performance of inventor name disambiguation systems such as PatentsView.org. The package includes benchmark datasets and evaluation tools, and aims to advance research on inventor name disambiguation by providing access to high-quality evaluation data and improving evaluation standards.
We present PatentsView-Evaluation, a Python package that enables researchers to evaluate the performance of inventor name disambiguation systems such as PatentsView.org. The package includes benchmark datasets and evaluation tools, and aims to advance research on inventor name disambiguation by providing access to high-quality evaluation data and improving evaluation standards.
△ Less
Submitted 9 January, 2023;
originally announced January 2023.
-
Causal Analysis of the TOPCAT Trial: Spironolactone for Preserved Cardiac Function Heart Failure
Authors:
Francesca E. D. Raimondi,
Tadhg O'Keeffe,
Hana Chockler,
Andrew R. Lawrence,
Tamara Stemberga,
Andre Franca,
Maksim Sipos,
Javed Butler,
Shlomo Ben-Haim
Abstract:
We describe the results of applying causal discovery methods on the data from a multi-site clinical trial, on the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT). The trial was inconclusive, with no clear benefits consistently shown for the whole cohort. However, there were questions regarding the reliability of the diagnosis and treatment protocol for…
▽ More
We describe the results of applying causal discovery methods on the data from a multi-site clinical trial, on the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT). The trial was inconclusive, with no clear benefits consistently shown for the whole cohort. However, there were questions regarding the reliability of the diagnosis and treatment protocol for a geographic subgroup of the cohort. With the inclusion of medical context in the form of domain knowledge, causal discovery is used to demonstrate regional discrepancies and to frame the regional transportability of the results. Furthermore, we show that, globally and especially for some subgroups, the treatment has significant causal effects, thus offering a more refined view of the trial results.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Personal Productivity and Well-being -- Chapter 2 of the 2021 New Future of Work Report
Authors:
Jenna Butler,
Mary Czerwinski,
Shamsi Iqbal,
Sonia Jaffe,
Kate Nowak,
Emily Peloquin,
Longqi Yang
Abstract:
We now turn to understanding the impact that COVID-19 had on the personal productivity and well-being of information workers as their work practices were impacted by remote work. This chapter overviews people's productivity, satisfaction, and work patterns, and shows that the challenges and benefits of remote work are closely linked. Looking forward, the infrastructure surrounding work will need t…
▽ More
We now turn to understanding the impact that COVID-19 had on the personal productivity and well-being of information workers as their work practices were impacted by remote work. This chapter overviews people's productivity, satisfaction, and work patterns, and shows that the challenges and benefits of remote work are closely linked. Looking forward, the infrastructure surrounding work will need to evolve to help people adapt to the challenges of remote and hybrid work.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
A Tale of Two Cities: Software Developers Working from Home During the COVID-19 Pandemic
Authors:
Denae Ford,
Margaret-Anne Storey,
Thomas Zimmermann,
Christian Bird,
Sonia Jaffe,
Chandra Maddila,
Jenna L. Butler,
Brian Houck,
Nachiappan Nagappan
Abstract:
The COVID-19 pandemic has shaken the world to its core and has provoked an overnight exodus of developers that normally worked in an office setting to working from home. The magnitude of this shift and the factors that have accompanied this new unplanned work setting go beyond what the software engineering community has previously understood to be remote work. To find out how developers and their…
▽ More
The COVID-19 pandemic has shaken the world to its core and has provoked an overnight exodus of developers that normally worked in an office setting to working from home. The magnitude of this shift and the factors that have accompanied this new unplanned work setting go beyond what the software engineering community has previously understood to be remote work. To find out how developers and their productivity were affected, we distributed two surveys (with a combined total of 3,634 responses that answered all required questions) -- weeks apart to understand the presence and prevalence of the benefits, challenges, and opportunities to improve this special circumstance of remote work. From our thematic qualitative analysis and statistical quantitative analysis, we find that there is a dichotomy of developer experiences influenced by many different factors (that for some are a benefit, while for others a challenge). For example, a benefit for some was being close to family members but for others having family members share their working space and interrupting their focus, was a challenge. Our surveys led to powerful narratives from respondents and revealed the scale at which these experiences exist to provide insights as to how the future of (pandemic) remote work can evolve.
△ Less
Submitted 10 September, 2021; v1 submitted 25 August, 2020;
originally announced August 2020.
-
A Modular Deep Learning Pipeline for Galaxy-Scale Strong Gravitational Lens Detection and Modeling
Authors:
Sandeep Madireddy,
Nesar Ramachandra,
Nan Li,
James Butler,
Prasanna Balaprakash,
Salman Habib,
Katrin Heitmann,
The LSST Dark Energy Science Collaboration
Abstract:
Upcoming large astronomical surveys are expected to capture an unprecedented number of strong gravitational lensing systems. Deep learning is emerging as a promising practical tool for the detection and quantification of these galaxy-scale image distortions. The absence of large quantities of representative data from current astronomical surveys motivates the development of a robust forward-modeli…
▽ More
Upcoming large astronomical surveys are expected to capture an unprecedented number of strong gravitational lensing systems. Deep learning is emerging as a promising practical tool for the detection and quantification of these galaxy-scale image distortions. The absence of large quantities of representative data from current astronomical surveys motivates the development of a robust forward-modeling approach using synthetic lensing images. Using a mock sample of strong lenses created upon a state-of-the-art extragalactic catalogs, we train a modular deep learning pipeline for uncertainty-quantified detection and modeling with intermediate image processing components for denoising and deblending the lensing systems. We demonstrate a high degree of interpretability and controlled systematics due to domain-specific task modules trained with different stages of synthetic image generation. For lens detection and modeling, we obtain semantically meaningful latent spaces that separate classes of strong lens images and yield uncertainty estimates that explain the origin of misclassified images and provide probabilistic predictions for the lens parameters. Validation of the inference pipeline has been carried out using images from the Subaru telescope's Hyper Suprime-Cam camera, and LSST DESC simulated DC2 sky survey catalogues.
△ Less
Submitted 21 October, 2022; v1 submitted 10 November, 2019;
originally announced November 2019.
-
Assessing Regulatory Risk in Personal Financial Advice Documents: a Pilot Study
Authors:
Wanita Sherchan,
Simon Harris,
Sue Ann Chen,
Nebula Alam,
Khoi-Nguyen Tran,
Adam J. Makarucha,
Christopher J. Butler
Abstract:
Assessing regulatory compliance of personal financial advice is currently a complex manual process. In Australia, only 5%- 15% of advice documents are audited annually and 75% of these are found to be non-compliant(ASI 2018b). This paper describes a pilot with an Australian government regulation agency where Artificial Intelligence (AI) models based on techniques such natural language processing (…
▽ More
Assessing regulatory compliance of personal financial advice is currently a complex manual process. In Australia, only 5%- 15% of advice documents are audited annually and 75% of these are found to be non-compliant(ASI 2018b). This paper describes a pilot with an Australian government regulation agency where Artificial Intelligence (AI) models based on techniques such natural language processing (NLP), machine learning and deep learning were developed to methodically characterise the regulatory risk status of personal financial advice documents. The solution provides traffic light rating of advice documents for various risk factors enabling comprehensive coverage of documents in the review and allowing rapid identification of documents that are at high risk of non-compliance with government regulations. This pilot serves as a case study of public-private partnership in develo** AI systems for government and public sector.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Document Chunking and Learning Objective Generation for Instruction Design
Authors:
Khoi-Nguyen Tran,
Jey Han Lau,
Danish Contractor,
Utkarsh Gupta,
Bikram Sengupta,
Christopher J. Butler,
Mukesh Mohania
Abstract:
Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing. Specifically in designing courses, an hour of training material can require between 30 to 500 hours of effort in sourcing and organizing reference data for use in just the preparation of course material. In this paper, we p…
▽ More
Instructional Systems Design is the practice of creating of instructional experiences that make the acquisition of knowledge and skill more efficient, effective, and appealing. Specifically in designing courses, an hour of training material can require between 30 to 500 hours of effort in sourcing and organizing reference data for use in just the preparation of course material. In this paper, we present the first system of its kind that helps reduce the effort associated with sourcing reference material and course creation. We present algorithms for document chunking and automatic generation of learning objectives from content, creating descriptive content metadata to improve content-discoverability. Unlike existing methods, the learning objectives generated by our system incorporate pedagogically motivated Bloom's verbs. We demonstrate the usefulness of our methods using real world data from the banking industry and through a live deployment at a large pharmaceutical company.
△ Less
Submitted 5 August, 2018; v1 submitted 1 June, 2018;
originally announced June 2018.
-
Project ACT: Social Media Analytics in Disaster Response
Authors:
Wanita Sherchan,
Shaila Pervin,
Christopher J. Butler,
Jennifer C. Lai
Abstract:
In large-scale emergencies social media has become a key source of information for public awareness, government authorities and relief agencies. However, the sheer volume of data and the low signal-to- noise ratio limit the effectiveness and the efficiency of using social media as an intelligence resource. We describe Australian Crisis Tracker (ACT), a tool designed for agencies responding to larg…
▽ More
In large-scale emergencies social media has become a key source of information for public awareness, government authorities and relief agencies. However, the sheer volume of data and the low signal-to- noise ratio limit the effectiveness and the efficiency of using social media as an intelligence resource. We describe Australian Crisis Tracker (ACT), a tool designed for agencies responding to large- scale emergency events, to facilitate the understanding of critical information in Twitter. ACT was piloted by the Australian Red Cross (ARC) during the 2013-2014 Australian bushfires season.
Video is available at: https://www.youtube.com/watch?v=Y-1rtNFqQbE
△ Less
Submitted 7 October, 2016;
originally announced October 2016.
-
The Role of Spreadsheets in the Allied Irish Bank / Allfirst Currency Trading Fraud
Authors:
Raymond J. Butler
Abstract:
This brief paper outlines how spreadsheets were used as one of the vehicles for John Rusnak's fraud and the revenue control lessons this case gives us.
This brief paper outlines how spreadsheets were used as one of the vehicles for John Rusnak's fraud and the revenue control lessons this case gives us.
△ Less
Submitted 11 October, 2009;
originally announced October 2009.
-
Risk Assessment For Spreadsheet Developments: Choosing Which Models to Audit
Authors:
Raymond J. Butler
Abstract:
Errors in spreadsheet applications and models are alarmingly common (some authorities, with justification cite spreadsheets containing errors as the norm rather than the exception). Faced with this body of evidence, the auditor can be faced with a huge task - the temptation may be to launch code inspections for every spreadsheet in an organisation. This can be very expensive and time-consuming.…
▽ More
Errors in spreadsheet applications and models are alarmingly common (some authorities, with justification cite spreadsheets containing errors as the norm rather than the exception). Faced with this body of evidence, the auditor can be faced with a huge task - the temptation may be to launch code inspections for every spreadsheet in an organisation. This can be very expensive and time-consuming. This paper describes risk assessment based on the "SpACE" audit methodology used by H M Customs & Excise's tax inspectors. This allows the auditor to target resources on the spreadsheets posing the highest risk of error, and justify the deployment of those resources to managers and clients. Since the opposite of audit risk is audit assurance the paper also offers an overview of some elements of good practice in the use of spreadsheets in business.
△ Less
Submitted 27 May, 2008;
originally announced May 2008.
-
Applying the CobiT Control Framework to Spreadsheet Developments
Authors:
Raymond J. Butler
Abstract:
One of the problems reported by researchers and auditors in the field of spreadsheet risks is that of getting and kee** managements attention to the problem. Since 1996, the Information Systems Audit & Control Foundation and the IT Governance Institute have published CobiT which brings mainstream IT control issues into the corporate governance arena. This paper illustrates how spreadsheet risk…
▽ More
One of the problems reported by researchers and auditors in the field of spreadsheet risks is that of getting and kee** managements attention to the problem. Since 1996, the Information Systems Audit & Control Foundation and the IT Governance Institute have published CobiT which brings mainstream IT control issues into the corporate governance arena. This paper illustrates how spreadsheet risk and control issues can be mapped onto the CobiT framework and thus brought to managers attention in a familiar format.
△ Less
Submitted 3 January, 2008;
originally announced January 2008.
-
Spreadsheets in Clinical Medicine
Authors:
Grenville J. Croll,
Raymond J. Butler
Abstract:
There is overwhelming evidence that the continued and widespread use of untested spreadsheets in business gives rise to regular, significant and unexpected financial losses. Whilst this is worrying, it is perhaps a relatively minor concern compared with the risks arising from the use of poorly constructed and/or untested spreadsheets in medicine, a practice that is already occurring. This articl…
▽ More
There is overwhelming evidence that the continued and widespread use of untested spreadsheets in business gives rise to regular, significant and unexpected financial losses. Whilst this is worrying, it is perhaps a relatively minor concern compared with the risks arising from the use of poorly constructed and/or untested spreadsheets in medicine, a practice that is already occurring. This article is intended as a warning that the use of poorly constructed and/or untested spreadsheets in clinical medicine cannot be tolerated. It supports this warning by reporting on potentially serious weaknesses found while testing a limited number of publicly available clinical spreadsheets.
△ Less
Submitted 3 October, 2007;
originally announced October 2007.