-
Data-Driven Turbulence Modeling Approach for Cold-Wall Hypersonic Boundary Layers
Authors:
Muhammad I. Zafar,
Xuhui Zhou,
Christopher J. Roy,
David Stelter,
Heng Xiao
Abstract:
Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence mo…
▽ More
Wall-cooling effect in hypersonic boundary layers can significantly alter the near-wall turbulence behavior, which is not accurately modeled by traditional RANS turbulence models. To address this shortcoming, this paper presents a turbulence modeling approach for hypersonic flows with cold-wall conditions using an iterative ensemble Kalman method. Specifically, a neural-network-based turbulence model is used to provide closure map** from mean flow quantities to Reynolds stress as well as a variable turbulent Prandtl number. Sparse observation data of velocity and temperature are used to train the turbulence model. This approach is analyzed using direct numerical simulation database for boundary layer flows over a flat plate with a Mach number between 6 and 14 and wall-to-recovery temperature ratios ranging from 0.18 to 0.76. Two training cases are conducted: 1) a single training case with observation data from one flow case, 2) a joint training case where data from two flow cases are simultaneously used for training. Trained models are also tested for generalizability on the remaining flow cases in each of the training cases. The results are also analyzed for insights to inform the future work towards enhancing the generalizability of the learned turbulence model.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Homophilic organization of egocentric communities in ICT services
Authors:
Chandreyee Roy,
Hang-Hyun Jo,
János Kertész,
Kimmo Kaski,
János Török
Abstract:
Members of a society can be characterized by a large number of features, such as gender, age, ethnicity, religion, social status, and shared activities. One of the main tie-forming factors between individuals in human societies is homophily, the tendency of being attracted to similar others. Homophily has been mainly studied with focus on one of the features and little is known about the roles of…
▽ More
Members of a society can be characterized by a large number of features, such as gender, age, ethnicity, religion, social status, and shared activities. One of the main tie-forming factors between individuals in human societies is homophily, the tendency of being attracted to similar others. Homophily has been mainly studied with focus on one of the features and little is known about the roles of similarities of different origins in the formation of communities. To close this gap, we analyze three datasets from Information and Communications Technology (ICT) services, namely, two online social networks and a network deduced from mobile phone calls, in all of which metadata about individual features are available. We identify communities within egocentric networks and surprisingly find that the larger the community is, the more overlap is found between features of its members and the ego. We interpret this finding in terms of the effort needed to manage the communities; the larger diversity requires more effort such that to maintain a large diverse group may exceed the capacity of the members. As the ego reaches out to her alters on an ICT service, we observe that the first alter in each community tends to have a higher feature overlap with the ego than the rest. Moreover the feature overlap of the ego with all her alters displays a non-monotonic behaviors as a function of the ego's degree. We propose a simple mechanism of how people add links in their egocentric networks of alters that reproduces all the empirical observations and shows the reason behind non-monotonic tendency of the egocentric feature overlap as a function of the ego's degree.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Can We Identify Stack Overflow Questions Requiring Code Snippets? Investigating the Cause & Effect of Missing Code Snippets
Authors:
Saikat Mondal,
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code sn…
▽ More
On the Stack Overflow (SO) Q&A site, users often request solutions to their code-related problems (e.g., errors, unexpected behavior). Unfortunately, they often miss required code snippets during their question submission, which could prevent their questions from getting prompt and appropriate answers. In this study, we conduct an empirical study investigating the cause & effect of missing code snippets in SO questions whenever required. Here, our contributions are threefold. First, we analyze how the presence or absence of required code snippets affects the correlation between question types (missed code, included code after requests & had code snippets during submission) and corresponding answer meta-data (e.g., presence of an accepted answer). According to our analysis, the chance of getting accepted answers is three times higher for questions that include required code snippets during their question submission than those that missed the code. We also investigate whether the confounding factors (e.g., user reputation) affect questions receiving answers besides the presence or absence of required code snippets. We found that such factors do not hurt the correlation between the presence or absence of required code snippets and answer meta-data. Second, we surveyed 64 practitioners to understand why users miss necessary code snippets. About 60% of them agree that users are unaware of whether their questions require any code snippets. Third, we thus extract four text-based features (e.g., keywords) and build six ML models to identify the questions that need code snippets. Our models can predict the target questions with 86.5% precision, 90.8% recall, 85.3% F1-score, and 85.2% overall accuracy. Our work has the potential to save significant time in programming question-answering and improve the quality of the valuable knowledge base by decreasing unanswered and unresolved questions.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Enhancing User Interaction in ChatGPT: Characterizing and Consolidating Multiple Prompts for Issue Resolution
Authors:
Saikat Mondal,
Suborno Deb Bappon,
Chanchal K. Roy
Abstract:
Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from Ch…
▽ More
Prompt design plays a crucial role in sha** the efficacy of ChatGPT, influencing the model's ability to extract contextually accurate responses. Thus, optimal prompt construction is essential for maximizing the utility and performance of ChatGPT. However, sub-optimal prompt design may necessitate iterative refinement, as imprecise or ambiguous instructions can lead to undesired responses from ChatGPT. Existing studies explore several prompt patterns and strategies to improve the relevance of responses generated by ChatGPT. However, the exploration of constraints that necessitate the submission of multiple prompts is still an unmet attempt. In this study, our contributions are twofold. First, we attempt to uncover gaps in prompt design that demand multiple iterations. In particular, we manually analyze 686 prompts that were submitted to resolve issues related to Java and Python programming languages and identify eleven prompt design gaps (e.g., missing specifications). Such gap exploration can enhance the efficacy of single prompts in ChatGPT. Second, we attempt to reproduce the ChatGPT response by consolidating multiple prompts into a single one. We can completely consolidate prompts with four gaps (e.g., missing context) and partially consolidate prompts with three gaps (e.g., additional functionality). Such an effort provides concrete evidence to users to design more optimal prompts mitigating these gaps. Our study findings and evidence can - (a) save users time, (b) reduce costs, and (c) increase user satisfaction.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study
Authors:
Joy Krishan Das,
Saikat Mondal,
Chanchal K. Roy
Abstract:
Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solv…
▽ More
Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solving efficiency. While previous studies have demonstrated the potential of ChatGPT in areas such as automatic program repair, debugging, and code generation, there is a lack of study on how developers explicitly utilize ChatGPT to resolve issues in their tracking system. Hence, this study aims to examine the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. In addition, we assess the code reliability by confirming if the code produced by ChatGPT was integrated into the project's codebase using the clone detection tool NiCad. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code, possibly due to concerns over the generation of "hallucinated code", as highlighted in the literature.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Dynamical stability and phase space analysis of an Emergent Universe with non-interacting and interacting fluids
Authors:
Bikash Chandra Roy,
Anirban Chanda,
Bikash Chandra Paul
Abstract:
We investigate the evolution of a flat Emergent Universe obtained with a non-linear equation of state (nEoS) in Einstein's general theory of Relativity. The nEoS is equivalent to three different types of barotropic cosmic fluids, which are found from the nEoS parameter. The EU began expanding initially with no interaction among the cosmic fluids. Assuming an interaction that sets in at a time…
▽ More
We investigate the evolution of a flat Emergent Universe obtained with a non-linear equation of state (nEoS) in Einstein's general theory of Relativity. The nEoS is equivalent to three different types of barotropic cosmic fluids, which are found from the nEoS parameter. The EU began expanding initially with no interaction among the cosmic fluids. Assuming an interaction that sets in at a time $t \geq t_i$ in the fluid components, we study the evolution of the EU that leads to the present observed universe. We adopt a dynamical system analysis method to obtain the critical points of the autonomous system for studying the evolution of an EU with or without interaction in fluid components. We also study the stability of critical points and draw the phase portraits. The density parameters and the corresponding cosmological parameters are obtained for both the non-interacting and interacting phases of the evolution dynamics.
△ Less
Submitted 5 January, 2024; v1 submitted 1 January, 2024;
originally announced January 2024.
-
Neural operator-based super-fidelity: A warm-start approach for accelerating steady-state simulations
Authors:
Xu-Hui Zhou,
Jiequn Han,
Muhammad I. Zafar,
Christopher J. Roy,
Heng Xiao
Abstract:
In recent years, using neural networks to speed up the solving of partial differential equations (PDEs) has gained significant traction in both academic and industrial settings. However, the use of neural networks as standalone surrogate models raises concerns about the reliability of solutions due to their dependence on data volume, quality, and training algorithms, especially in precision-critic…
▽ More
In recent years, using neural networks to speed up the solving of partial differential equations (PDEs) has gained significant traction in both academic and industrial settings. However, the use of neural networks as standalone surrogate models raises concerns about the reliability of solutions due to their dependence on data volume, quality, and training algorithms, especially in precision-critical scientific tasks. This study introduces a novel "super-fidelity" method, which uses neural networks for initial warm-starting in solving steady-state PDEs, ensuring both speed and accuracy. Drawing from super-resolution concepts in computer vision, our approach maps low-fidelity model solutions to high-fidelity targets using a vector-cloud neural network with equivariance (VCNN-e), maintaining all necessary invariance and equivariance properties for scalar and vector solutions. This method adapts well to different spatial resolutions. We tested this approach in two scientific computing scenarios: one with weak nonlinearity, using low Reynolds number flows around elliptical cylinders, and another with strong nonlinearity, using high Reynolds number flows over airfoils. In both cases, our neural operator-based initialization significantly accelerated convergence by at least two-fold, without sacrificing accuracy, compared to traditional methods. Its robustness is confirmed across various iterative algorithms with different linear equation solvers. The approach also demonstrated time savings in multiple simulations, even including model development time. Additionally, we propose an efficient training data generation strategy. Overall, our method offers an efficient way to accelerate steady-state PDE solutions using neural operators without loss of accuracy, especially relevant in precision-focused scientific applications.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Pilot tone-guided focused navigation for free-breathing whole-liver fat-water and T2* quantification
Authors:
Adèle LC Mackowiak,
Christopher W Roy,
Mariana BL Falcão,
Mario Bacher,
Aurélien Bustin,
Jérôme Yerly,
Peter Speier,
Matthias Stuber,
Naïk Vietti-Violi,
Jessica AM Bastiaansen
Abstract:
Purpose To achieve whole-liver motion-corrected fat fraction (FF) and R2* quantification with a 3-minute free-breathing (FB) 3D radial isotropic acquisition, for increased organ coverage, ease-of-use, and patient comfort. Methods A FB 3D radial multiecho gradient-echo liver acquisition with integrated Pilot Tone (PT) navigation and NTE=8 echoes was reconstructed with a motion-correction algorithm…
▽ More
Purpose To achieve whole-liver motion-corrected fat fraction (FF) and R2* quantification with a 3-minute free-breathing (FB) 3D radial isotropic acquisition, for increased organ coverage, ease-of-use, and patient comfort. Methods A FB 3D radial multiecho gradient-echo liver acquisition with integrated Pilot Tone (PT) navigation and NTE=8 echoes was reconstructed with a motion-correction algorithm based on focused navigation and guided by PT signals (PT-fNAV), with and without a denoising step. Fat fraction (FF) and R2* quantification using a graph cut algorithm was performed on the motion-corrected whole-liver multiecho volumes. Volunteer experiments (n=10) at 1.5T included reference 3D and 2D Cartesian breath-hold (BH) acquisitions. Image sharpness was assessed to evaluate the quality of motion correction with PT-fNAV, compared to a motion-resolved reconstruction. Fat-water images and parametric maps were compared to BH reference acquisitions following Cartesian trajectories, and to a routinely used clinical software (MRQuantiF). Results The image sharpness provided by PT-fNAV (with and without denoising) was similar in end-expiratory motion-resolved reconstructions. The 3D radial FB FF maps compared well with reference BH 3D Cartesian maps (bias +0.7%, limits of agreement (LOA) [-2.5; 4.0]%) and with 2D quantification with MRQuantiF (-0.2%, LOA [-1.1; 0.6]%). While expected visual deviations between proposed FB and reference BH R2* maps were observed, no significant differences were found in quantitative analyses. Conclusion A 3D radial technique with retrospective motion correction by PT-fNAV enabled FF and R2* quantification of the whole-liver at 1.5T. The FB whole-liver acquisition at isotropic spatial resolution compared in accuracy with BH techniques, enabling 3D assessment of steatosis in individuals with limited respiratory capabilities.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Investigating Technology Usage Span by Analyzing Users' Q&A Traces in Stack Overflow
Authors:
Saikat Mondal,
Debajyoti Mondal,
Chanchal K. Roy
Abstract:
Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challengin…
▽ More
Choosing an appropriate software development technology (e.g., programming language) is challenging due to the proliferation of diverse options. The selection of inappropriate technologies for development may have a far-reaching effect on software developers' career growth. Switching to a different technology after working with one may lead to a complex learning curve and, thus, be more challenging. Therefore, it is crucial for software developers to find technologies that have a high usage span. Intuitively, the usage span of a technology can be determined by the time span developers have used that technology. Existing literature focuses on the technology landscape to explore the complex and implicit dependencies among technologies but lacks formal studies to draw insights about their usage span. This paper investigates the technology usage span by analyzing the question and answering (Q&A) traces of Stack Overflow (SO), the largest technical Q&A website available to date. In particular, we analyze 6.7 million Q&A traces posted by about 97K active SO users and see what technologies have appeared in their questions or answers over 15 years. According to our analysis, C# and Java programming languages have a high usage span, followed by JavaScript. Besides, developers used the .NET framework, iOS & Windows Operating Systems (OS), and SQL query language for a long time (on average). Our study also exposes the emerging (i.e., newly growing) technologies. For example, usages of technologies such as SwiftUI, .NET-6.0, Visual Studio 2022, and Blazor WebAssembly framework are increasing. The findings from our study can assist novice developers, startup software industries, and software users in determining appropriate technologies. This also establishes an initial benchmark for future investigation on the use span of software technologies.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Three-Wave Mixing Quantum-Limited Kinetic Inductance Parametric Amplifier operating at 6 Tesla and near 1 Kelvin
Authors:
Simone Frasca,
Camille Roy,
Guillaume Beaulieu,
Pasquale Scarlino
Abstract:
Parametric amplifiers play a crucial role in modern quantum technology by enabling the enhancement of weak signals with minimal added noise. Traditionally, Josephson junctions have been the primary choice for constructing parametric amplifiers. Nevertheless, high-kinetic inductance thin films have emerged as viable alternatives to engineer the necessary nonlinearity. In this work, we introduce and…
▽ More
Parametric amplifiers play a crucial role in modern quantum technology by enabling the enhancement of weak signals with minimal added noise. Traditionally, Josephson junctions have been the primary choice for constructing parametric amplifiers. Nevertheless, high-kinetic inductance thin films have emerged as viable alternatives to engineer the necessary nonlinearity. In this work, we introduce and characterize a Kinetic Inductance Parametric Amplifier (KIPA) built using high-quality NbN superconducting thin films. The KIPA addresses some of the limitations of traditional Josephson-based parametric amplifiers, excelling in dynamic range, operational temperature, and magnetic field resilience. We demonstrate a quantum-limited amplification (> 20 dB) with a 20 MHz gain-bandwidth product, operational at fields up to 6 Tesla and temperatures as high as 850 mK. Harnessing kinetic inductance in NbN thin films, the KIPA emerges as a robust solution for quantum signal amplification, enhancing research possibilities in quantum information processing and low-temperature quantum experiments. Its magnetic field compatibility and quantum-limited performance at high temperatures make it an invaluable tool, promising new advancements in quantum research.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Differences of communication activity and mobility patterns between urban and rural people
Authors:
Fumiko Ogushi,
Chandreyee Roy,
Kimmo Kaski
Abstract:
Human mobility and other social activity patterns influence various aspects of society such as urban planning, traffic predictions, crisis resilience, and epidemic prevention. The behaviour of individuals, like their communication frequencies and movements, are shaped by societal and socio-economic factors. In addition, the differences in the geolocation of people as well as their gender and age c…
▽ More
Human mobility and other social activity patterns influence various aspects of society such as urban planning, traffic predictions, crisis resilience, and epidemic prevention. The behaviour of individuals, like their communication frequencies and movements, are shaped by societal and socio-economic factors. In addition, the differences in the geolocation of people as well as their gender and age cast effects on their activity patterns. In this study we focus on investigating these patterns by using mobile phone data, specifically the call detail records (CDRs), to analyze the social communication and mobility patterns of people. This dataset can provide us insight into the individual and population-level behaviours in rural and urban environments on a daily, weekly and seasonal basis. The results of our analyses show that in the urban areas people have high calling activity but low mobility, while in the rural areas they show the opposite behaviour, i.e. low calling activity combined with high mobility. Overall, there is a decreasing trend in people's mobility through the year even though their calling activity remained consistent except for the holidays during which time the communication frequency drops markedly. We have also observed that there are significant differences in the mobility between the work days and free days. Finally, the age and gender of individuals have also been observed to play a role in the seasonal patterns differently in urban and rural areas.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Residential clustering and mobility of ethnic groups
Authors:
Kunal Bhattacharya,
Chandreyee Roy,
Tuomas Takko,
Anna Rotkirch,
Kimmo Kaski
Abstract:
We studied residential clustering and mobility of ethnic minorities using a theoretical framework based on null models of spatial distributions and movements of populations. Using microdata from population registers we compared the patterns of clustering amongst various socioethnic groups living in and around the capital region of Finland. Using the models we were able to connect the factors influ…
▽ More
We studied residential clustering and mobility of ethnic minorities using a theoretical framework based on null models of spatial distributions and movements of populations. Using microdata from population registers we compared the patterns of clustering amongst various socioethnic groups living in and around the capital region of Finland. Using the models we were able to connect the factors influencing intraurban migration to the spatial patterns that have been developed over time. We could also demonstrate the interrelationship of the movement and clustering with fertility. The observed clustering seems to be a combined effect of fertility and the tendency to migrate locally. The models also highlight the importance of factors like proximity to the city-centre, average neighbourhood income, and similarity of socioeconomic profiles.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
TiO2 multi-leg nanotubes for Surface-enhanced Raman scattering
Authors:
Harini S,
Garima Gupta,
Somnath C. Roy,
Rambabu Yalavarthi
Abstract:
In the recent past, significant research efforts have been put forth to fabricate low-cost noble metal-free substrates for surface-enhanced Raman spectroscopy (SERS) applications. Here we propose semiconducting TiO2 multi-leg nanotubes (TiO2 MLNTs, with and without the gold nanoparticle coating) as SERS substrates. TiO2 MLNTs show unique multi-leg morphology compared to the conventional non-multi-…
▽ More
In the recent past, significant research efforts have been put forth to fabricate low-cost noble metal-free substrates for surface-enhanced Raman spectroscopy (SERS) applications. Here we propose semiconducting TiO2 multi-leg nanotubes (TiO2 MLNTs, with and without the gold nanoparticle coating) as SERS substrates. TiO2 MLNTs show unique multi-leg morphology compared to the conventional non-multi-leg tubes and possess better light-harvesting properties. TiO2 MLNTs are fabricated with a simple and versatile single-step electrochemical anodization method. Remarkable high SERS sensitivity is observed towards the detection of Methylene blue (MB), up to nM concentration (E.F. ~104). The same is attributed to the resonantly matched photonic absorption edge of TiO2 MLNTs with the wavelength of incident laser probe light. On the other hand, gold nanoparticle-coated TiO2 MLNTs demonstrated further enhancement in SERS sensitivity (E.F. ~105, for nM of MB) facilitated by the synergy that exists between the plasmonic modes (LSPRs) of Au and the photonic absorption mode of TiO2 MLNTs.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Observational constraints on the Emergent Universe with interacting non-linear fluids and its stability analysis
Authors:
Anirban Chanda,
Bikash Chandra Roy,
Kazuharu Bamba,
Bikash Chandra Paul
Abstract:
We investigate a flat Emergent Universe (EU) with a nonlinear equation of state which is equivalent to three different compositions of fluids. In the EU, initially, the evolution of the universe began with no interaction, but as time evolves, an interaction sets in among the three fluids leading to the observed universe. The characteristic of an EU is that it is a singularity-free universe that ev…
▽ More
We investigate a flat Emergent Universe (EU) with a nonlinear equation of state which is equivalent to three different compositions of fluids. In the EU, initially, the evolution of the universe began with no interaction, but as time evolves, an interaction sets in among the three fluids leading to the observed universe. The characteristic of an EU is that it is a singularity-free universe that evolves with all the basic features of the early evolution. A given nonlinear equation of state parameter permits a universe with three different fluids. We get a universe with dark energy, cosmic string, and radiation domination to begin with, which at a later epoch transits into a universe with three different fluids with matter domination, dark matter, and dark energy for a given interaction strength among the cosmic fluids. Later the model parameters are constrained using the observed Hubble data and Type Ia Supernova (SnIa) data from the Pantheon data set. The classical stability analysis of the model is performed using the square speed of sound. It is found that a theoretically stable cosmological model can be obtained in this case, however, the model becomes classically unstable at the present epoch when the observational bounds on the model parameters are taken into account.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Unveiling the potential of large language models in generating semantic and cross-language clones
Authors:
Palash R. Roy,
Ajmain I. Alam,
Farouq Al-omari,
Banani Roy,
Chanchal K. Roy,
Kevin A. Schneider
Abstract:
Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone posses…
▽ More
Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone possesses a code snippet in a particular programming language but seeks equivalent functionality in a different language, a semantic cross-language code clone generation approach could provide valuable assistance. In this study, using SemanticCloneBench as a vehicle, we evaluated how well the GPT-3 model could help generate semantic and cross-language clone variants for a given fragment.We have comprised a diverse set of code fragments and assessed GPT-3s performance in generating code variants.Through extensive experimentation and analysis, where 9 judges spent 158 hours to validate, we investigate the model's ability to produce accurate and semantically correct variants. Our findings shed light on GPT-3's strengths in code generation, offering insights into the potential applications and challenges of using advanced language models in software development. Our quantitative analysis yields compelling results. In the realm of semantic clones, GPT-3 attains an impressive accuracy of 62.14% and 0.55 BLEU score, achieved through few-shot prompt engineering. Furthermore, the model shines in transcending linguistic confines, boasting an exceptional 91.25% accuracy in generating cross-language clones
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench
Authors:
Ajmain Inqiad Alam,
Palash Ranjan Roy,
Farouq Al-omari,
Chanchal Kumar Roy,
Banani Roy,
Kevin Schneider
Abstract:
With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting t…
▽ More
With the emergence of Machine Learning, there has been a surge in leveraging its capabilities for problem-solving across various domains. In the code clone realm, the identification of type-4 or semantic clones has emerged as a crucial yet challenging task. Researchers aim to utilize Machine Learning to tackle this challenge, often relying on the BigCloneBench dataset. However, it's worth noting that BigCloneBench, originally not designed for semantic clone detection, presents several limitations that hinder its suitability as a comprehensive training dataset for this specific purpose. Furthermore, CLCDSA dataset suffers from a lack of reusable examples aligning with real-world software systems, rendering it inadequate for cross-language clone detection approaches. In this work, we present a comprehensive semantic clone and cross-language clone benchmark, GPTCloneBench by exploiting SemanticCloneBench and OpenAI's GPT-3 model. In particular, using code fragments from SemanticCloneBench as sample inputs along with appropriate prompt engineering for GPT-3 model, we generate semantic and cross-language clones for these specific fragments and then conduct a combination of extensive manual analysis, tool-assisted filtering, functionality testing and automated validation in building the benchmark. From 79,928 clone pairs of GPT-3 output, we created a benchmark with 37,149 true semantic clone pairs, 19,288 false semantic pairs(Type-1/Type-2), and 20,770 cross-language clones across four languages (Java, C, C#, and Python). Our benchmark is 15-fold larger than SemanticCloneBench, has more functional code examples for software systems and programming language support than CLCDSA, and overcomes BigCloneBench's qualities, quantification, and language variety limitations.
△ Less
Submitted 1 September, 2023; v1 submitted 26 August, 2023;
originally announced August 2023.
-
A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Authors:
Morteza Zakeri-Nasrabadi,
Saeed Parsa,
Mohammad Ramezani,
Chanchal Roy,
Masoud Ekhtiarzadeh
Abstract:
Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the exist…
▽ More
Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance
Authors:
Weicheng Xue,
Christohper John Roy
Abstract:
Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully conne…
▽ More
Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to optimize 14 key parameters related to GPU kernel scheduling, including the number of thread blocks and threads within a block. Our approach utilizes fully connected neural networks as the underlying machine learning model, with the tuning parameters as inputs to the neural networks and the actual execution time of a simulation as the outputs. To assess the effectiveness of our autotuning approach, we conducted experiments on three different types of GPUs, with computational speeds ranging from low to high. We performed independent training for each GPU model and also explored combined training across multiple GPU models. By leveraging artificial neural networks, our autotuning technique achieved remarkable results in tuning a wide range of parameters, leading to enhanced performance for a CFD code. Importantly, our approach demonstrated its efficacy while requiring only a small fraction of samples from the large parameter search space. This efficiency is attributed to the effectiveness of the fully connected neural networks in capturing the complex relationships between the parameter settings and the resulting performance. Overall, our study showcases the potential of machine learning, specifically fully connected neural networks, in autotuning GPU-accelerated CFD codes. By leveraging this approach, researchers and practitioners can achieve high performance in scientific simulations with optimized parameter configurations.
△ Less
Submitted 20 February, 2024; v1 submitted 24 June, 2023;
originally announced June 2023.
-
CPU-GPU Heterogeneous Code Acceleration of a Finite Volume Computational Fluid Dynamics Solver
Authors:
Weicheng Xue,
Hongyu Wang,
Christopher J. Roy
Abstract:
This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing…
▽ More
This work deals with the CPU-GPU heterogeneous code acceleration of a finite-volume CFD solver utilizing multiple CPUs and GPUs at the same time. First, a high-level description of the CFD solver called SENSEI, the discretization of SENSEI, and the CPU-GPU heterogeneous computing workflow in SENSEI leveraging MPI and OpenACC are given. Then, a performance model for CPU-GPU heterogeneous computing requiring ghost cell exchange is proposed to help estimate the performance of the heterogeneous implementation. The scaling performance of the CPU-GPU heterogeneous computing and its comparison with the pure multi-CPU/GPU performance for a supersonic inlet test case is presented to display the advantages of leveraging the computational power of both the CPU and the GPU. Using CPUs and GPUs as workers together, the performance can be improved further compared to using pure CPUs or GPUs, and the advantages can be fairly estimated by the performance model proposed in this work. Finally, conclusions are drawn to provide 1) suggestions for application users who have an interest to leverage the computational power of the CPU and GPU to accelerate their own scientific computing simulations and 2) feedback for hardware architects who have an interest to design a better CPU-GPU heterogeneous system for heterogeneous computing.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions
Authors:
Saikat Mondal,
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e…
▽ More
In Stack Overflow (SO), the quality of posts (i.e., questions and answers) is subjectively evaluated by users through a voting mechanism. The net votes (upvotes - downvotes) obtained by a post are often considered an approximation of its quality. However, about half of the questions that received working solutions got more downvotes than upvotes. Furthermore, about 18% of the accepted answers (i.e., verified solutions) also do not score the maximum votes. All these counter-intuitive findings cast doubts on the reliability of the evaluation mechanism employed at SO. Moreover, many users raise concerns against the evaluation, especially downvotes to their posts. Therefore, rigorous verification of the subjective evaluation is highly warranted to ensure a non-biased and reliable quality assessment mechanism. In this paper, we compare the subjective assessment of questions with their objective assessment using 2.5 million questions and ten text analysis metrics. According to our investigation, four objective metrics agree with the subjective evaluation, two do not agree, one either agrees or disagrees, and the remaining three neither agree nor disagree with the subjective evaluation. We then develop machine learning models to classify the promoted and discouraged questions. Our models outperform the state-of-the-art models with a maximum of about 76% - 87% accuracy.
△ Less
Submitted 7 April, 2023;
originally announced April 2023.
-
Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection
Authors:
Subroto Nag Pinku,
Debajyoti Mondal,
Chanchal K. Roy
Abstract:
Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also e…
▽ More
Software clones are often introduced when developers reuse code fragments to implement similar functionalities in the same or different software systems. Many high-performing clone detection tools today are based on deep learning techniques and are mostly used for detecting clones written in the same programming language, whereas clone detection tools for detecting cross-language clones are also emerging rapidly. The popularity of deep learning-based clone detection tools creates an opportunity to investigate how known strategies that boost the performances of deep learning models could be further leveraged to improve clone detection tools. In this paper, we investigate such a strategy, data augmentation, which has not yet been explored for cross-language clone detection as opposed to single-language clone detection. We show how the existing knowledge on transcompilers (source-to-source translators) can be used for data augmentation to boost the performance of cross-language clone detection models, as well as to adapt single-language clone detection models to create cross-language clone detection pipelines. To demonstrate the performance boost for cross-language clone detection through data augmentation, we exploit Transcoder, which is a pre-trained source-to-source translator. To show how to extend single-language models for cross-language clone detection, we extend a popular single-language model, Graph Matching Network (GMN) in a combination with the transcompilers. We evaluated our models on popular benchmark datasets. Our experimental results showed improvements in F1 scores (sometimes up to 3%) for the cutting-edge cross-language clone detection models. Even when extending GMN for cross-language clone detection, the models built leveraging data augmentation outperformed the baseline with scores of 0.90, 0.92, and 0.91 for precision, recall, and F1 score, respectively.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
Motion-resolved fat-fraction map** with whole-heart free-running multiecho gre and pilot tone
Authors:
Adèle L. C. Mackowiak,
Christopher W. Roy,
Jérôme Yerly,
Mariana B. L. Falcão,
Mario Bacher,
Peter Speier,
Davide Piccini,
Matthias Stuber,
Jessica A. M. Bastiaansen
Abstract:
PURPOSE To develop free-running multi-echo GRE for cardiac- and respiratory-motion-resolved whole-heart fat fraction quantification. METHODS Multi-echo readouts optimized for water-fat separation and quantification were integrated within a non-ECG-triggered free-breathing 3D radial GRE acquisition. Pilot Tone navigation was used to extract cardiac and respiratory motion states. Following a XD-GRAS…
▽ More
PURPOSE To develop free-running multi-echo GRE for cardiac- and respiratory-motion-resolved whole-heart fat fraction quantification. METHODS Multi-echo readouts optimized for water-fat separation and quantification were integrated within a non-ECG-triggered free-breathing 3D radial GRE acquisition. Pilot Tone navigation was used to extract cardiac and respiratory motion states. Following a XD-GRASP based image reconstruction of the separate echoes, fat fraction, water fraction, R2star and B0 maps, as well as fat and water images, were generated with a maximum likelihood fitting algorithm using graph cuts. The acquisition, reconstruction and post-processing framework was tested in 10 healthy volunteers at 1.5T and compared to a free-breathing ECG-triggered 5-echo acquisition. RESULTS The acquisition was successfully validated in vivo, with motion compensation achieved over all collected echoes, in both respiratory and cardiac dimensions. Pilot Tone navigation provided respiratory and cardiac signals in good agreement (r=0.954 and r=0.783, respectively) with self-gating signal extraction based on the MR data of the first echo. The framework enabled pericardial fat imaging and quantification across the cardiac cycle, revealing a decrease in apparent FF at systole across volunteers. 3D motion-resolved fat fraction maps showed good correlation with reference ECG-triggered measurements, as well as significant difference in measurements performed with NTE = 4 and NTE = 8 echoes (P < 0.0001 in chest fat and P < 0.01 in pericardial fat). CONCLUSION Volunteer experiments at 1.5T demonstrated the feasibility of a whole-heart free-running fat-fraction map** technique for cardiac MRI in a 6 min scan time, with a resolution of 2mm3 isotropic.
△ Less
Submitted 12 April, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Automatic Prediction of Rejected Edits in Stack Overflow
Authors:
Saikat Mondal,
Gias Uddin,
Chanchal Roy
Abstract:
The content quality of shared knowledge in Stack Overflow (SO) is crucial in supporting software developers with their programming problems. Thus, SO allows its users to suggest edits to improve the quality of a post (i.e., question and answer). However, existing research shows that many suggested edits in SO are rejected due to undesired contents/formats or violating edit guidelines. Such a scena…
▽ More
The content quality of shared knowledge in Stack Overflow (SO) is crucial in supporting software developers with their programming problems. Thus, SO allows its users to suggest edits to improve the quality of a post (i.e., question and answer). However, existing research shows that many suggested edits in SO are rejected due to undesired contents/formats or violating edit guidelines. Such a scenario frustrates or demotivates users who would like to conduct good-quality edits. Therefore, our research focuses on assisting SO users by offering them suggestions on how to improve their editing of posts. First, we manually investigate 764 (382 questions + 382 answers) rejected edits by rollbacks and produce a catalog of 19 rejection reasons. Second, we extract 15 texts and user-based features to capture those rejection reasons. Third, we develop four machine learning models using those features. Our best-performing model can predict rejected edits with 69.1% precision, 71.2% recall, 70.1% F1-score, and 69.8% overall accuracy. Fourth, we introduce an online tool named EditEx that works with the SO edit system. EditEx can assist users while editing posts by suggesting the potential causes of rejections. We recruit 20 participants to assess the effectiveness of EditEx. Half of the participants (i.e., treatment group) use EditEx and another half (i.e., control group) use the SO standard edit system to edit posts. According to our experiment, EditEx can support SO standard edit system to prevent 49% of rejected edits, including the commonly rejected ones. However, it can prevent 12% rejections even in free-form regular edits. The treatment group finds the potential rejection reasons identified by EditEx influential. Furthermore, the median workload suggesting edits using EditEx is half compared to the SO edit system.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
OCFormer: One-Class Transformer Network for Image Classification
Authors:
Prerana Mukherjee,
Chandan Kumar Roy,
Swalpa Kumar Roy
Abstract:
We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, whi…
▽ More
We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches.
△ Less
Submitted 25 April, 2022;
originally announced April 2022.
-
Backports: Change Types, Challenges and Strategies
Authors:
Debasish Chakroborti,
Kevin A. Schneider,
Chanchal K. Roy
Abstract:
Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting…
▽ More
Source code repositories allow developers to manage multiple versions (or branches) of a software system. Pull-requests are used to modify a branch, and backporting is a regular activity used to port changes from a current development branch to other versions. In open-source software, backports are common and often need to be adapted by hand, which motivates us to explore backports and backporting challenges and strategies. In our exploration of 68,424 backports from 10 GitHub projects, we found that bug, test, document, and feature changes are commonly backported. We identified a number of backporting challenges, including that backports were inconsistently linked to their original pull-request (49%), that backports had incompatible code (13%), that backports failed to be accepted (10%), and that there were backporting delays (16 days to create, 5 days to merge). We identified some general strategies for addressing backporting issues. We also noted that backporting strategies depend on the project type and that further investigation is needed to determine their suitability. Furthermore, we created the first-ever backports dataset that can be used by other researchers and practitioners for investigating backports and backporting.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Turnover in close friendships: age and gender differences
Authors:
Chandreyee Roy,
Kunal Bhattacharya,
Robin I. M. Dunbar,
Kimmo Kaski
Abstract:
Humans are social animals and the interpersonal bonds formed between them are crucial for their development and well being in a society. These relationships are usually structured into several layers (Dunbar's layers of friendship) depending on their significance in an individual's life with closest friends and family being the most important ones taking major part of their time and communication…
▽ More
Humans are social animals and the interpersonal bonds formed between them are crucial for their development and well being in a society. These relationships are usually structured into several layers (Dunbar's layers of friendship) depending on their significance in an individual's life with closest friends and family being the most important ones taking major part of their time and communication effort. However, we have little idea how the initiation and termination of these relationships occurs across the lifespan. To explore this, we analyse a national cellphone database to determine how and when changes in close relationships occur in the two genders. In general, membership of this inner circle of intimate relationships is extremely stable, at least over a three-year period. However, around 1-4% of alters change every year, with the rate of change being higher among 17-21 year olds than older adults. Young adult females terminate more of their opposite-gender relationships, while older males are more persistent in trying to maintain relationships in decline. These results emphasise the variability in relationship dynamics across age and gender, and remind us that individual differences play an important role in the structure of social networks. Overall, our study provides a holistic understanding of the dynamic nature of relationships during the life-course of humans.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction
Authors:
Md Nadim,
Debajyoti Mondal,
Chanchal K. Roy
Abstract:
The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the…
▽ More
The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph to identify Just-in-Time (JIT) bug prediction in software systems during different revisions of software evolution and maintenance. We presented a method to convert the source codes of commit patches to equivalent graph representations and named it Source Code Graph (SCG). To understand and compare multiple source code graphs, we extracted several structural properties of these graphs, such as the density, number of cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs to visualize and detect buggy software commits. We process more than 246K software commits from 12 subject systems in this investigation. Our investigation on these 12 open-source software projects written in C++ and Java programming languages shows that if we combine the features from SCG with conventional features used in similar studies, we will get the increased performance of Machine Learning (ML) based buggy commit detection models. We also find the increase of F1~Scores in predicting buggy and non-buggy commits statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based feature values represent the style or structural properties of source code updates or changes in the software system, it suggests the importance of careful maintenance of source code style or structure for kee** a software system bug-free.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
Evaluating the Performance of Clone Detection Tools in Detecting Cloned Co-change Candidates
Authors:
Md Nadim,
Manishankar Mondal,
Chanchal K. Roy,
Kevin Schneider
Abstract:
Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a…
▽ More
Co-change candidates are the group of code fragments that require a change if any of these fragments experience a modification in a commit operation during software evolution. The cloned co-change candidates are a subset of the co-change candidates, and the members in this subset are clones of one another. The cloned co-change candidates are usually created by reusing existing code fragments in a software system. Detecting cloned co-change candidates is essential for clone-tracking, and studies have shown that we can use clone detection tools to find cloned co-change candidates. However, although several studies evaluate clone detection tools for their accuracy in detecting cloned fragments, we found no study that evaluates clone detection tools for detecting cloned co-change candidates. In this study, we explore the dimension of code clone research for detecting cloned co-change candidates. We compare the performance of 12 different configurations of nine promising clone detection tools in identifying cloned co-change candidates from eight open-source C and Java-based subject systems of various sizes and application domains. A ranked list and analysis of the results provides valuable insights and guidelines into selecting and configuring a clone detection tool for identifying co-change candidates and leads to a new dimension of code clone research into change impact analysis.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Decomposing the Deep: Finding Class Specific Filters in Deep CNNs
Authors:
Akshay Badola,
Cherian Roy,
Vineet Padmanabhan,
Rajendra Lal
Abstract:
Interpretability of Deep Neural Networks has become a major area of exploration. Although these networks have achieved state of the art accuracy in many tasks, it is extremely difficult to interpret and explain their decisions. In this work we analyze the final and penultimate layers of Deep Convolutional Networks and provide an efficient method for identifying subsets of features that contribute…
▽ More
Interpretability of Deep Neural Networks has become a major area of exploration. Although these networks have achieved state of the art accuracy in many tasks, it is extremely difficult to interpret and explain their decisions. In this work we analyze the final and penultimate layers of Deep Convolutional Networks and provide an efficient method for identifying subsets of features that contribute most towards the network's decision for a class. We demonstrate that the number of such features per class is much lower in comparison to the dimension of the final layer and therefore the decision surface of Deep CNNs lies on a low dimensional manifold and is proportional to the network depth. Our methods allow to decompose the final layer into separate subspaces which is far more interpretable and has a lower computational cost as compared to the final layer of the full network.
△ Less
Submitted 3 April, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
The Reproducibility of Programming-Related Issues in Stack Overflow Questions
Authors:
Saikat Mondal,
Mohammad Masudur Rahman,
Chanchal K. Roy,
Kevin Schneider
Abstract:
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate so…
▽ More
Software developers often look for solutions to their code-level problems using the Stack Overflow Q&A website. To receive help, developers frequently submit questions containing sample code segments and the description of the programming issue. Unfortunately, it is not always possible to reproduce the issues from the code segments that may impede questions from receiving prompt and appropriate solutions. We conducted an exploratory study on the reproducibility of issues discussed in 400 Java and 400 Python questions. We parsed, compiled, executed, and carefully examined the code segments from these questions to reproduce the reported programming issues. The outcomes of our study are three-fold. First, we found that we can reproduce approximately 68% of Java and 71% of Python issues, whereas we were unable to reproduce approximately 22% of Java and 19% of Python issues using the code segments. Of the issues that were reproducible, approximately 67% of the Java code segments and 20% of the Python code segments required minor or major modifications to reproduce the issues. Second, we carefully investigated why programming issues could not be reproduced and provided evidence-based guidelines for writing effective code examples for Stack Overflow questions. Third, we investigated the correlation between the issue reproducibility status of questions and the corresponding answer meta-data, such as the presence of an accepted answer. According to our analysis, a reproducible question has at least two times higher chance of receiving an accepted answer than an irreproducible question. Besides, the median time delay in receiving accepted answers is double if the issues reported in questions could not be reproduced. We also investigate the confounding factors (e.g., reputation) and find that confounding factors do not hurt the correlation between reproducibility status and answer meta-data.
△ Less
Submitted 25 December, 2021; v1 submitted 23 November, 2021;
originally announced November 2021.
-
An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets
Authors:
Gias Uddin,
Yann-Gael Gueheneuc,
Foutse Khomh,
Chanchal K Roy
Abstract:
Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sen…
▽ More
Sentiment analysis in software engineering (SE) has shown promise to analyze and support diverse development activities. We report the results of an empirical study that we conducted to determine the feasibility of develo** an ensemble engine by combining the polarity labels of stand-alone SE-specific sentiment detectors. Our study has two phases. In the first phase, we pick five SE-specific sentiment detection tools from two recently published papers by Lin et al. [31, 32], who first reported negative results with standalone sentiment detectors and then proposed an improved SE-specific sentiment detector, POME [31]. We report the study results on 17,581 units (sentences/documents) coming from six currently available sentiment benchmarks for SE. We find that the existing tools can be complementary to each other in 85-95% of the cases, i.e., one is wrong, but another is right. However, a majority voting-based ensemble of those tools fails to improve the accuracy of sentiment detection. We develop Sentisead, a supervised tool by combining the polarity labels and bag of words as features. Sentisead improves the performance (F1-score) of the individual tools by 4% (over Senti4SD [5]) - 100% (over POME [31]). In a second phase, we compare and improve Sentisead infrastructure using Pre-trained Transformer Models (PTMs). We find that a Sentisead infrastructure with RoBERTa as the ensemble of the five stand-alone rule-based and shallow learning SE-specific tools from Lin et al. [31, 32] offers the best F1-score of 0.805 across the six datasets, while a stand-alone RoBERTa shows an F1-score of 0.801.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Photocatalytic water splitting ability of Fe/MgO-rGO nanocomposites towards hydrogen evolution
Authors:
Fahmida Sharmin,
Dayal Chandra Roy,
M. A. Basith
Abstract:
Photocatalytic water splitting has greatly stimulated as an ideal technique for producing hydrogen (H$_{2}$) fuel by employing two renewable sources, i.e., water and solar energy. Here, we have adopted a facile hydrothermal approach for the successful synthesis of reduced graphene oxide (rGO) incorporated Fe/MgO nanocomposites followed by thermal treatment at inert atmosphere to investigate their…
▽ More
Photocatalytic water splitting has greatly stimulated as an ideal technique for producing hydrogen (H$_{2}$) fuel by employing two renewable sources, i.e., water and solar energy. Here, we have adopted a facile hydrothermal approach for the successful synthesis of reduced graphene oxide (rGO) incorporated Fe/MgO nanocomposites followed by thermal treatment at inert atmosphere to investigate their ability for photodegradation and photocatalytic hydrogen evolution via water splitting. Transmission Electron Microscopy images of Fe/MgO-rGO nanocomposite ensured the distribution of Fe/MgO nanoparticles throughout rGO sheets. Notably, all rGO supported nanocomposites, especially the one, thermally treated at 500 $^{o}$C at Argon (Ar) atmosphere has demonstrated significantly higher photocatalytic efficiency towards the photodegradation of a toxic textile dye, rhodamine B, than pristine MgO and commercially available Degussa P25 titania nanoparticles as well as other composites. Under solar irradiation, Fe/MgO-rGO(500) nanocomposite exhibited 86% degradation of rhodamine B dye and generated almost four times higher H$_{2}$ via photocatalytic water splitting compared to commercially available P25 titania nanoparticles. This promising photocatalytic ability of the Fe/MgO-rGO(500) nanocomposite can be attributed to the improved morphological and surface features due to heat treatment at inert atmosphere as well as escalated charge carrier separation with increased light absorption capacity imputed to rGO incorporation.
△ Less
Submitted 3 January, 2022; v1 submitted 2 November, 2021;
originally announced November 2021.
-
FaBiAN: A Fetal Brain magnetic resonance Acquisition Numerical phantom
Authors:
Hélène Lajous,
Christopher W. Roy,
Tom Hilbert,
Priscille de Dumast,
Sébastien Tourbier,
Yasser Alemán-Gómez,
Jérôme Yerly,
Thomas Yu,
Hamza Kebiri,
Kelly Payette,
Jean-Baptiste Ledoux,
Reto Meuli,
Patric Hagmann,
Andras Jakab,
Vincent Dunet,
Mériam Koob,
Tobias Kober,
Matthias Stuber,
Meritxell Bach Cuadra
Abstract:
Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of s…
▽ More
Accurate characterization of in utero human brain maturation is critical as it involves complex and interconnected structural and functional processes that may influence health later in life. Magnetic resonance imaging is a powerful tool to investigate equivocal neurological patterns during fetal development. However, the number of acquisitions of satisfactory quality available in this cohort of sensitive subjects remains scarce, thus hindering the validation of advanced image processing techniques. Numerical phantoms can mitigate these limitations by providing a controlled environment with a known ground truth. In this work, we present FaBiAN, an open-source Fetal Brain magnetic resonance Acquisition Numerical phantom that simulates clinical T2-weighted fast spin echo sequences of the fetal brain. This unique tool is based on a general, flexible and realistic setup that includes stochastic fetal movements, thus providing images of the fetal brain throughout maturation comparable to clinical acquisitions. We demonstrate its value to evaluate the robustness and optimize the accuracy of an algorithm for super-resolution fetal brain magnetic resonance imaging from simulated motion-corrupted 2D low-resolution series as compared to a synthetic high-resolution reference volume. We also show that the images generated can complement clinical datasets to support data-intensive deep learning methods for fetal brain tissue segmentation.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Semantic Slicing of Architectural Change Commits: Towards Semantic Design Review
Authors:
Amit Kumar Mondal,
Chanchal K. Roy,
Kevin A. Schneider,
Banani Roy,
Sristy Sumana Nath
Abstract:
Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural for…
▽ More
Software architectural changes involve more than one module or component and are complex to analyze compared to local code changes. Development teams aiming to review architectural aspects (design) of a change commit consider many essential scenarios such as access rules and restrictions on usage of program entities across modules. Moreover, design review is essential when proper architectural formulations are paramount for develo** and deploying a system. Untangling architectural changes, recovering semantic design, and producing design notes are the crucial tasks of the design review process. To support these tasks, we construct a lightweight tool [4] that can detect and decompose semantic slices of a commit containing architectural instances. A semantic slice consists of a description of relational information of involved modules, their classes, methods and connected modules in a change instance, which is easy to understand to a reviewer. We extract various directory and naming structures (DANS) properties from the source code for develo** our tool. Utilizing the DANS properties, our tool first detects architectural change instances based on our defined metric and then decomposes the slices (based on string processing). Our preliminary investigation with ten open-source projects (developed in Java and Kotlin) reveals that the DANS properties produce highly reliable precision and recall (93-100%) for detecting and generating architectural slices. Our proposed tool will serve as the preliminary approach for the semantic design recovery and design summary generation for the project releases.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
A Systematic Review of Automated Query Reformulations in Source Code Search
Authors:
Mohammad Masudur Rahman,
Chanchal K. Roy
Abstract:
Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced deve…
▽ More
Fixing software bugs and adding new features are two of the major maintenance tasks. Software bugs and features are reported as change requests. Developers consult these requests and often choose a few keywords from them as an ad hoc query. Then they execute the query with a search engine to find the exact locations within software code that need to be changed. Unfortunately, even experienced developers often fail to choose appropriate queries, which leads to costly trials and errors during a code search. Over the years, many studies attempt to reformulate the ad hoc queries from developers to support them. In this systematic literature review, we carefully select 70 primary studies on query reformulations from 2,970 candidate studies, perform an in-depth qualitative analysis (e.g., Grounded Theory), and then answer seven research questions with major findings. First, to date, eight major methodologies (e.g., term weighting, term co-occurrence analysis, thesaurus lookup) have been adopted to reformulate queries. Second, the existing studies suffer from several major limitations (e.g., lack of generalizability, vocabulary mismatch problem, subjective bias) that might prevent their wide adoption. Finally, we discuss the best practices and future opportunities to advance the state of research in search query reformulations.
△ Less
Submitted 8 June, 2023; v1 submitted 22 August, 2021;
originally announced August 2021.
-
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empirical Study
Authors:
Mohammad Masudur Rahman,
Foutse Khomh,
Shamima Yeasmin,
Chanchal K. Roy
Abstract:
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as searc…
▽ More
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports although they contain such queries. We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics, which has led us to actionable insights. Furthermore, we demonstrate 27%--34% improvement in the performance of non-optimal queries through the application of our actionable insights to them.
△ Less
Submitted 11 August, 2021;
originally announced August 2021.
-
Improved Retrieval of Programming Solutions With Code Examples Using a Multi-featured Score
Authors:
Rodrigo F. Silva,
M. Masudur Rahman,
Carlos Eduardo Dantas,
Chanchal Roy,
Foutse Khomh,
Marcelo A. Maia
Abstract:
Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and…
▽ More
Developers often depend on code search engines to obtain solutions for their programming tasks. However, finding an expected solution containing code examples along with their explanations is challenging due to several issues. There is a vocabulary mismatch between the search keywords (the query) and the appropriate solutions. Semantic gap may increase for similar bag of words due to antonyms and negation. Moreover, documents retrieved by search engines might not contain solutions containing both code examples and their explanations. So, we propose CRAR (Crowd Answer Recommender) to circumvent those issues aiming at improving retrieval of relevant answers from Stack Overflow containing not only the expected code examples for the given task but also their explanations. Given a programming task, we investigate the effectiveness of combining information retrieval techniques along with a set of features to enhance the ranking of important threads (i.e., the units containing questions along with their answers) for the given task and then selects relevant answers contained in those threads, including semantic features, like word embeddings and sentence embeddings, for instance, a Convolutional Neural Network (CNN). CRAR also leverages social aspects of Stack Overflow discussions like popularity to select relevant answers for the tasks. Our experimental evaluation shows that the combination of the different features performs better than each one individually. We also compare the retrieval performance with the state-of-art CROKAGE (Crowd Knowledge Answer Generator), which is also a system aimed at retrieving relevant answers from Stack Overflow. We show that CRAR outperforms CROKAGE in Mean Reciprocal Rank and Mean Recall with small and medium effect sizes, respectively.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Mining API Usage Scenarios from Stack Overflow
Authors:
Gias Uddin,
Foutse Khomh,
Chanchal K Roy
Abstract:
We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task…
▽ More
We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task by summarizing the discussions around the code example. Third, we automatically associate developers reactions (i.e., positive and negative opinions) towards the code example to offer information about code quality. We evaluate the algorithms using three benchmarks.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Automatic API Usage Scenario Documentation from Technical Q&A Sites
Authors:
Gias Uddin,
Foutse Khomh,
Chanchal K Roy
Abstract:
The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official…
▽ More
The online technical Q&A site Stack Overflow (SO) is popular among developers to support their coding and diverse development needs. To address shortcomings in API official documentation resources, several research has thus focused on augmenting official API documentation with insights (e.g., code examples) from SO. The techniques propose to add code examples/insights about APIs into its official documentation. Reviews are opinionated sentences with positive/negative sentiments. However, we are aware of no previous research that attempts to automatically produce API documentation from SO by considering both API code examples and reviews. In this paper, we present two novel algorithms that can be used to automatically produce API documentation from SO by combining code examples and reviews towards those examples. The first algorithm is called statistical documentation, which shows the distribution of positivity and negativity around the code examples of an API using different metrics (e.g., star ratings). The second algorithm is called concept-based documentation, which clusters similar and conceptually relevant usage scenarios. An API usage scenario contains a code example, a textual description of the underlying task addressed by the code example, and the reviews (i.e., opinions with positive and negative sentiments) from other developers towards the code example. We deployed the algorithms in Opiner, a web-based platform to aggregate information about APIs from online forums. We evaluated the algorithms by mining all Java JSON-based posts in SO and by conducting three user studies based on produced documentation from the posts.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
Ab initio study of the density dependence of the Grüneisen parameter at pressures up to 360 GPa
Authors:
Umesh C. Roy,
Subir K. Sarkar
Abstract:
Ab initio calculations based on the Density Functional Theory are used to show that the Debye frequency is a linear function of density to a high accuracy for several elemental solids at pressures (at least) up to 360 GPa. This implies that the ratio of density over the (Debye-frequency-based) vibrational Grüneisen parameter is a linear function of density in this region. Numerical data from first…
▽ More
Ab initio calculations based on the Density Functional Theory are used to show that the Debye frequency is a linear function of density to a high accuracy for several elemental solids at pressures (at least) up to 360 GPa. This implies that the ratio of density over the (Debye-frequency-based) vibrational Grüneisen parameter is a linear function of density in this region. Numerical data from first principles calculations for several systems at temperatures up to 2000K suggest that this is also true for the thermal Grüneisen parameter in the same range of pressure. Our analytical form of the vibrational Grüneisen parameter is applied to an implementation of the Lindemann's melting criterion to obtain a simple extrapolation formula for the melting temperatures of materials at higher densities. This prediction is tested against available experimental and numerical data for several elemental solids.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC
Authors:
Weicheng Xue,
Charles W. Jackson,
Christoper J. Roy
Abstract:
This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling…
▽ More
This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30$\times$ and 70$\times$ faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling for the multi-block CFD code are addressed by applying various optimizations. Performance optimizations such as the pack/unpack message method, removing temporary arrays as arguments to procedure calls, allocating global memory for limiters and connected boundary data, reordering non-blocking MPI I\_send/I\_recv and Wait calls, reducing unnecessary implicit derived type member data movement between the host and the device and the use of GPUDirect can improve the compute utilization, memory throughput, and asynchronous progression in the multi-block CFD code using modern programming features.
△ Less
Submitted 4 December, 2020;
originally announced December 2020.
-
Motion Compensated Whole-Heart Coronary Magnetic Resonance Angiography using Focused Navigation (fNAV)
Authors:
Christopher W Roy,
John Heerfordt,
Davide Piccini,
Giulia Rossi,
Anna Giulia Pavon,
Juerg Schwitter,
Matthias Stuber
Abstract:
Background: RSN whole-heart CMRA is a technique that estimates and corrects for respiratory motion. However, RSN has been limited to a 1D rigid correction which is often insufficient for patients with complex respiratory patterns. The goal of this work is therefore to improve the robustness and quality of 3D radial CMRA by incorporating both 3D motion information and nonrigid intra-acquisition cor…
▽ More
Background: RSN whole-heart CMRA is a technique that estimates and corrects for respiratory motion. However, RSN has been limited to a 1D rigid correction which is often insufficient for patients with complex respiratory patterns. The goal of this work is therefore to improve the robustness and quality of 3D radial CMRA by incorporating both 3D motion information and nonrigid intra-acquisition correction of the data into a framework called focused navigation (fNAV). Methods: We applied fNAV to 500 data sets from a numerical simulation, 22 healthy volunteers, and 549 cardiac patients. We compared fNAV to RSN and respiratory resolved XD-GRASP reconstructions of the same data and recorded reconstruction times. Motion accuracy was measured as the correlation between fNAV and ground truth for simulations, and fNAV and image registration for in vivo data. Vessel sharpness was measured using Soap-Bubble. Finally, image quality analysis was performed by a blinded expert reviewer who chose the best image for each data set. Results The reconstruction time for fNAV images was significantly higher than RSN (6.1 +/- 2.1 minutes vs 1.4 +/- 0.3, minutes, p<0.025) but significantly lower than XD-GRASP (25.6 +/- 7.1, minutes, p<0.025). There is high correlation between the fNAV, and reference displacement estimates across all data sets (0.73 +/- 0.29). For all data, fNAV lead to significantly sharper vessels than all other reconstructions (p < 0.01). Finally, a blinded reviewer chose fNAV as the best image in 239 out of 571 cases (p = 10-5). Conclusion: fNAV is a promising technique for improving free-breathing 3D radial whole-heart CMRA. This novel approach to respiratory self-navigation can derive 3D nonrigid motion estimations from an acquired 1D signal yielding statistically significant improvement in image sharpness relative to 1D translational correction as well as XD-GRASP reconstructions.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Electronic Structure of Graphene/TiO$_2$ Interface: Design and Functional Perspectives
Authors:
Shashi B. Mishra,
Somnath C. Roy,
B. R. K. Nanda
Abstract:
We propose the design of low strained and energetically favourable mono and bilayer graphene overlayer on anatase TiO$_2$ (001) surface and examined the electronic structure of the interface with the aid of first principle calculations. In the absence of hybridization between surface TiO$_2$ and graphene states, dipolar fluctuations govern the minor charge transfer across the interface. As a resul…
▽ More
We propose the design of low strained and energetically favourable mono and bilayer graphene overlayer on anatase TiO$_2$ (001) surface and examined the electronic structure of the interface with the aid of first principle calculations. In the absence of hybridization between surface TiO$_2$ and graphene states, dipolar fluctuations govern the minor charge transfer across the interface. As a result, both the substrate and the overlayer retain their pristine electronic structure. The interface with the monolayer graphene retains its gapless linear band dispersion irrespective of the induced epitaxial strain. The potential gradient opens up a few meV bandgap in the case of Bernal stacking and strengthens the interpenetration of the Dirac cones in the case of hexagonal stacking of the bilayer graphene. The difference between the macroscopic average potential of the TiO$_2$ and graphene layer(s) in the heterostructure lies in the range 3 to 3.13 eV, which is very close to the TiO$_2$ bandgap ($\sim$ 3.2 eV). Therefore, the proposed heterostructure will exhibit enhanced photo-induced charge transfer and the graphene component will serve as a visible light sensitizer.
△ Less
Submitted 19 December, 2020; v1 submitted 28 July, 2020;
originally announced July 2020.
-
T2 Map** from Super-Resolution-Reconstructed Clinical Fast Spin Echo Magnetic Resonance Acquisitions
Authors:
Hélène Lajous,
Tom Hilbert,
Christopher W. Roy,
Sébastien Tourbier,
Priscille de Dumast,
Thomas Yu,
Jean-Philippe Thiran,
Jean-Baptiste Ledoux,
Davide Piccini,
Patric Hagmann,
Reto Meuli,
Tobias Kober,
Matthias Stuber,
Ruud B. van Heeswijk,
Meritxell Bach Cuadra
Abstract:
Relaxometry studies in preterm and at-term newborns have provided insight into brain microstructure, thus opening new avenues for studying normal brain development and supporting diagnosis in equivocal neurological situations. However, such quantitative techniques require long acquisition times and therefore cannot be straightforwardly translated to in utero brain developmental studies. In clinica…
▽ More
Relaxometry studies in preterm and at-term newborns have provided insight into brain microstructure, thus opening new avenues for studying normal brain development and supporting diagnosis in equivocal neurological situations. However, such quantitative techniques require long acquisition times and therefore cannot be straightforwardly translated to in utero brain developmental studies. In clinical fetal brain magnetic resonance imaging routine, 2D low-resolution T2-weighted fast spin echo sequences are used to minimize the effects of unpredictable fetal motion during acquisition. As super-resolution techniques make it possible to reconstruct a 3D high-resolution volume of the fetal brain from clinical low-resolution images, their combination with quantitative acquisition schemes could provide fast and accurate T2 measurements. In this context, the present work demonstrates the feasibility of using super-resolution reconstruction from conventional T2-weighted fast spin echo sequences for 3D isotropic T2 map**. A quantitative magnetic resonance phantom was imaged using a clinical T2-weighted fast spin echo sequence at variable echo time to allow for super-resolution reconstruction at every echo time and subsequent T2 map** of samples whose relaxometric properties are close to those of fetal brain tissue. We demonstrate that this approach is highly repeatable, accurate and robust when using six echo times (total acquisition time under 9 minutes) as compared to gold-standard single-echo spin echo sequences (several hours for one single 2D slice).
△ Less
Submitted 23 July, 2020;
originally announced July 2020.
-
Free-running SIMilarity-Based Angiography (SIMBA) for simplified anatomical MR imaging of the heart
Authors:
John Heerfordt,
Kevin K. Whitehead,
Jessica A. M. Bastiaansen,
Lorenzo Di Sopra,
Christopher W. Roy,
Jérôme Yerly,
Bastien Milani,
Mark A. Fogel,
Matthias Stuber,
Davide Piccini
Abstract:
Purpose: Whole-heart MRA techniques typically target pre-determined motion states and address cardiac and respiratory dynamics independently. We propose a novel fast reconstruction algorithm, applicable to ungated free-running sequences, that leverages inherent similarities in the acquired data to avoid such physiological constraints.
Theory and Methods: The proposed SIMilarity-Based Angiography…
▽ More
Purpose: Whole-heart MRA techniques typically target pre-determined motion states and address cardiac and respiratory dynamics independently. We propose a novel fast reconstruction algorithm, applicable to ungated free-running sequences, that leverages inherent similarities in the acquired data to avoid such physiological constraints.
Theory and Methods: The proposed SIMilarity-Based Angiography (SIMBA) method clusters the continuously acquired k-space data in order to find a motion-consistent subset that can be reconstructed into a motion-suppressed whole-heart MRA. Free-running 3D radial datasets from six ferumoxytol-enhanced scans of pediatric cardiac patients and twelve non-contrast scans of healthy volunteers were reconstructed with a non-motion-suppressed regridding of all the acquired data (All Data), our proposed SIMBA method, and a previously published free-running framework (FRF) that uses cardiac and respiratory self-gating and compressed sensing. Images were compared for blood-myocardium interface sharpness, contrast ratio, and visibility of coronary artery ostia.
Results: Both the fast SIMBA reconstruction (~20s) and the FRF provided significantly higher blood-myocardium sharpness than All Data (P<0.001). No significant difference was observed among the former two. Significantly higher blood-myocardium contrast ratio was obtained with SIMBA compared to All Data and FRF (P<0.01). More coronary ostia could be visualized with both SIMBA and FRF than with All Data (All Data: 4/36, SIMBA: 30/36, FRF: 33/36, both P<0.001) but no significant difference was found between the first two.
Conclusion: The combination of free-running sequences and the fast SIMBA reconstruction, which operates without a priori assumptions related to physiological motion, forms a simple workflow for obtaining whole-heart MRA with sharp anatomical structures.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Birch's law at elevated temperatures
Authors:
Umesh C. Roy,
Subir K. Sarkar
Abstract:
Birch's law in high pressure physics postulates a linear relationship between elastic wave speed and density and one of its most well known applications is in investigations into the composition of the inner core of the Earth using the Preliminary Reference Earth Model as the primary source of constraints. However, it has never been subjected to high precision tests even at moderately elevated tem…
▽ More
Birch's law in high pressure physics postulates a linear relationship between elastic wave speed and density and one of its most well known applications is in investigations into the composition of the inner core of the Earth using the Preliminary Reference Earth Model as the primary source of constraints. However, it has never been subjected to high precision tests even at moderately elevated temperatures. Here we carry out such a test by making use of the Density Functional Theory of electronic structure calculation and the Density Functional Perturbation Theory of calculating the phonon dispersion relation. We show that a recently proposed modification to the Birch's law is consistently satisfied more accurately than its original version. This modified version states that it is the product of elastic wave speed and one-third power of density that should be a linear function of density. We have studied the cases of platinum, palladium, molybdenum and rhodium with cubic unit cell and iron with hexagonal-close-packed unit cell with temperatures up to 1500K and pressures up to about 360 GPa. We also examine the genericity of the validity of a recently proposed extension of the Birch's law according to which elastic wave speed is a linear function of temperature at a given density. Within the error bars of our calculation, we find that this is consistent with our data for the four cubic materials at temperatures up to 3300 K.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
A Survey on the Evaluation of Clone Detection Performance and Benchmarking
Authors:
Jeffrey Svajlenko,
Chanchal K. Roy
Abstract:
There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the author…
▽ More
There are a great many clone detection tools proposed in the literature. In this paper, we investigate the state of clone detection tool evaluation. We begin by surveying the clone detection benchmarks, and performing a multi-faceted evaluation and comparison of their features and capabilities. We then survey the existing clone detection tool and technique publications, and evaluate how the authors of these works evaluate their own tools/techniques. We rank the individual works by how well they measure recall, precision, execution time and scalability. We select the works the best evaluate all four metrics as exemplars that should be considered by future researchers publishing clone detection tools/techniques when designing the empirical evaluation of their tool/technique. We measure statistics on tool evaluation by the authors, and find that evaluation is poor amongst the authors. We finish our investigation into clone detection evaluation by surveying the existing tool comparison studies, including both the qualitative and quantitative studies.
△ Less
Submitted 28 June, 2020;
originally announced June 2020.
-
Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms
Authors:
Weicheng Xue,
Christopher J. Roy
Abstract:
This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs…
▽ More
This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs due to the noncontiguous memory access. The performance using whatever decompositions can be benefited from a series of performance optimizations in the paper. Since the buoyancy driven cavity code is latency-bounded on the clusters examined, a series of optimizations both agnostic and tailored to the platforms are designed to reduce the latency cost and improve memory throughput between hosts and devices efficiently. First, the parallel message packing/unpacking strategy developed for noncontiguous data movement between hosts and devices improves the overall performance by about a factor of 2. Second, transferring different data based on the stencil sizes for different variables further reduces the communication overhead. These two optimizations are general enough to be beneficial to stencil computations having ghost changes on all of the clusters tested. Third, GPUDirect is used to improve the communication on clusters which have the hardware and software support for direct communication between GPUs without staging CPU's memory. Finally, overlap** the communication and computations is shown to be not efficient on multi-GPUs if only using MPI or MPI+OpenACC. Although we believe our implementation has revealed enough overlap, the actual running does not utilize the overlap well due to a lack of asynchronous progression.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Don't Explain without Verifying Veracity: An Evaluation of Explainable AI with Video Activity Recognition
Authors:
Mahsan Nourani,
Chiradeep Roy,
Tahrima Rahman,
Eric D. Ragan,
Nicholas Ruozzi,
Vibhav Gogate
Abstract:
Explainable machine learning and artificial intelligence models have been used to justify a model's decision-making process. This added transparency aims to help improve user performance and understanding of the underlying model. However, in practice, explainable systems face many open questions and challenges. Specifically, designers might reduce the complexity of deep learning models in order to…
▽ More
Explainable machine learning and artificial intelligence models have been used to justify a model's decision-making process. This added transparency aims to help improve user performance and understanding of the underlying model. However, in practice, explainable systems face many open questions and challenges. Specifically, designers might reduce the complexity of deep learning models in order to provide interpretability. The explanations generated by these simplified models, however, might not accurately justify and be truthful to the model. This can further add confusion to the users as they might not find the explanations meaningful with respect to the model predictions. Understanding how these explanations affect user behavior is an ongoing challenge. In this paper, we explore how explanation veracity affects user performance and agreement in intelligent systems. Through a controlled user study with an explainable activity recognition system, we compare variations in explanation veracity for a video review and querying task. The results suggest that low veracity explanations significantly decrease user performance and agreement compared to both accurate explanations and a system without explanations. These findings demonstrate the importance of accurate and understandable explanations and caution that poor explanations can sometimes be worse than no explanations with respect to their effect on user performance and reliance on an AI system.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
The Vision of Software Clone Management: Past, Present, and Future
Authors:
Chanchal K. Roy,
Minhaz F. Zibran,
Rainer Koschke
Abstract:
Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, wh…
▽ More
Duplicated code or code clones are a kind of code smell that have both positive and negative impacts on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while research in recent years extends to the whole spectrum of clone management. In the last decade, three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactoring, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system. We believe that we have done a good job in surveying the area of clone management and that this work may serve as a kind of roadmap for future research in the area
△ Less
Submitted 3 May, 2020;
originally announced May 2020.