-
Accelerated boundary integral analysis of energy eigenvalues for confined electron states in quantum semiconductor heterostructures
Authors:
J. D. Phan,
A. -V. Phan
Abstract:
This paper presents a novel and efficient approach for the computation of energy eigenvalues in quantum semiconductor heterostructures. Accurate determination of the electronic states in these heterostructures is crucial for understanding their optical and electronic properties, making it a key challenge in semiconductor physics. The proposed method is based on utilizing series expansions of zero-…
▽ More
This paper presents a novel and efficient approach for the computation of energy eigenvalues in quantum semiconductor heterostructures. Accurate determination of the electronic states in these heterostructures is crucial for understanding their optical and electronic properties, making it a key challenge in semiconductor physics. The proposed method is based on utilizing series expansions of zero-order Bessel functions to numerically solve the Schrödinger equation using boundary integral method for bound electron states in a computationally efficient manner. To validate the proposed technique, we applied it to address previously explored issues by other research groups. The results clearly demonstrate the computational efficiency and high precision of our approach. Notably, the proposed technique significantly reduces the computational time compared to the conventional method for searching the energy eigenvalues in quantum structures.
△ Less
Submitted 15 June, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies
Authors:
Ratnadira Widyasari,
Sheng Qin Sim,
Camellia Lok,
Haodi Qi,
Jack Phan,
Qi** Tay,
Constance Tan,
Fiona Wee,
Jodie Ethelda Tan,
Yuheng Yieh,
Brian Goh,
Ferdian Thung,
Hong ** Kang,
Thong Hoang,
David Lo,
Eng Lieh Ouh
Abstract:
The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the a…
▽ More
The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the abundance of testing and debugging tools for Java. Thus, there is a need to push research on tools that can help Python developers. One factor that contributed to the rapid growth of Java testing and debugging tools is the availability of benchmarks. A popular benchmark is the Defects4J benchmark; its initial version contained 357 real bugs from 5 real-world Java programs. Each bug comes with a test suite that can expose the bug. Defects4J has been used by hundreds of testing and debugging studies and has helped to push the frontier of research in these directions. In this project, inspired by Defects4J, we create another benchmark database and tool that contain 493 real bugs from 17 real-world Python programs. We hope our benchmark can help catalyze future work on testing and debugging tools that work on Python programs.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python
Authors:
Ratnadira Widyasari,
Zhou Yang,
Ferdian Thung,
Sheng Qin Sim,
Fiona Wee,
Camellia Lok,
Jack Phan,
Haodi Qi,
Constance Tan,
Qi** Tay,
David Lo
Abstract:
Machine learning (ML) has gained much attention and been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle in understanding ML projects. To help…
▽ More
Machine learning (ML) has gained much attention and been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such a high-quality dataset poses an obstacle in understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
The radial distribution of supernovae compared to star formation tracers
Authors:
Fiona M. Audcent-Ross,
Gerhardt R. Meurer,
James R. Audcent,
Stuart D. Ryder,
O. Ivy Wong,
J. Phan,
A. Williamson,
J. H. Kim
Abstract:
Given the limited availability of direct evidence (pre-explosion observations) for supernova (SN) progenitors, the location of supernovae (SNe) within their host galaxies can be used to set limits on one of their most fundamental characteristics, their initial progenitor mass. We present our constraints on SN progenitors derived by comparing the radial distributions of 80 SNe in the SINGG and SUNG…
▽ More
Given the limited availability of direct evidence (pre-explosion observations) for supernova (SN) progenitors, the location of supernovae (SNe) within their host galaxies can be used to set limits on one of their most fundamental characteristics, their initial progenitor mass. We present our constraints on SN progenitors derived by comparing the radial distributions of 80 SNe in the SINGG and SUNGG surveys to the R-band, Halpha, and UV light distributions of the 55 host galaxies. The strong correlation of Type Ia SNe with R-band light is consistent with models containing only low mass progenitors, reflecting earlier findings. When we limit the analysis of Type II SNe to apertures containing 90 per cent of the total flux, the radial distribution of these SNe best traces far ultraviolet (FUV) emission, consistent with recent direct detections indicating Type II SNe have moderately massive red supergiant progenitors. Stripped Envelope (SE) SNe have the strongest correlation with Halpha fluxes, indicative of very massive progenitors (M* > 20 M_solar). This result contradicts a small, but growing, number of direct detections of SE SN progenitors indicating they are moderately massive binary systems. Our result is consistent, however, with a recent population analysis suggesting binary SE SN progenitor masses are regularly underestimated. SE SNe are centralised with respect to Type II SNe and there are no SE SNe recorded beyond half the maximum disc radius in the optical and one third the disc radius in the ultraviolet. The absence of SE SNe beyond these distances is consistent with reduced massive star formation efficiencies in the outskirts of the host galaxies.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Dual Active Sampling on Batch-Incremental Active Learning
Authors:
Johan Phan,
Massimiliano Ruocco,
Francesco Scibilia
Abstract:
Recently, Convolutional Neural Networks (CNNs) have shown unprecedented success in the field of computer vision, especially on challenging image classification tasks by relying on a universal approach, i.e., training a deep model on a massive dataset of supervised examples. While unlabeled data are often an abundant resource, collecting a large set of labeled data, on the other hand, are very expe…
▽ More
Recently, Convolutional Neural Networks (CNNs) have shown unprecedented success in the field of computer vision, especially on challenging image classification tasks by relying on a universal approach, i.e., training a deep model on a massive dataset of supervised examples. While unlabeled data are often an abundant resource, collecting a large set of labeled data, on the other hand, are very expensive, which often require considerable human efforts. One way to ease out this is to effectively select and label highly informative instances from a pool of unlabeled data (i.e., active learning). This paper proposed a new method of batch-mode active learning, Dual Active Sampling(DAS), which is based on a simple assumption, if two deep neural networks (DNNs) of the same structure and trained on the same dataset give significantly different output for a given sample, then that particular sample should be picked for additional training. While other state of the art methods in this field usually require intensive computational power or relying on a complicated structure, DAS is simpler to implement and, managed to get improved results on Cifar-10 with preferable computational time compared to the core-set method.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
A Multi-Modal Graph-Based Semi-Supervised Pipeline for Predicting Cancer Survival
Authors:
Hamid Reza Hassanzadeh,
John H. Phan,
May D. Wang
Abstract:
Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of ca…
▽ More
Cancer survival prediction is an active area of research that can help prevent unnecessary therapies and improve patient's quality of life. Gene expression profiling is being widely used in cancer studies to discover informative biomarkers that aid predict different clinical endpoint prediction. We use multiple modalities of data derived from RNA deep-sequencing (RNA-seq) to predict survival of cancer patients. Despite the wealth of information available in expression profiles of cancer tumors, fulfilling the aforementioned objective remains a big challenge, for the most part, due to the paucity of data samples compared to the high dimension of the expression profiles. As such, analysis of transcriptomic data modalities calls for state-of-the-art big-data analytics techniques that can maximally use all the available data to discover the relevant information hidden within a significant amount of noise. In this paper, we propose a pipeline that predicts cancer patients' survival by exploiting the structure of the input (manifold learning) and by leveraging the unlabeled samples using Laplacian support vector machines, a graph-based semi supervised learning (GSSL) paradigm. We show that under certain circumstances, no single modality per se will result in the best accuracy and by fusing different models together via a stacked generalization strategy, we may boost the accuracy synergistically. We apply our approach to two cancer datasets and present promising results. We maintain that a similar pipeline can be used for predictive tasks where labeled samples are expensive to acquire.
△ Less
Submitted 17 November, 2016;
originally announced November 2016.
-
A Semi-Supervised Method for Predicting Cancer Survival Using Incomplete Clinical Data
Authors:
Hamid Reza Hassanzadeh,
John H. Phan,
May D. Wang
Abstract:
Prediction of survival for cancer patients is an open area of research. However, many of these studies focus on datasets with a large number of patients. We present a novel method that is specifically designed to address the challenge of data scarcity, which is often the case for cancer datasets. Our method is able to use unlabeled data to improve classification by adopting a semi-supervised train…
▽ More
Prediction of survival for cancer patients is an open area of research. However, many of these studies focus on datasets with a large number of patients. We present a novel method that is specifically designed to address the challenge of data scarcity, which is often the case for cancer datasets. Our method is able to use unlabeled data to improve classification by adopting a semi-supervised training approach to learn an ensemble classifier. The results of applying our method to three cancer datasets show the promise of semi-supervised learning for prediction of cancer survival.
△ Less
Submitted 29 September, 2015;
originally announced September 2015.
-
Minimal monomial ideals and linear resolutions
Authors:
Jeffry Phan
Abstract:
A minimal monomial ideal is the combinatorially simplest monomial ideal whose lcm-lattice equals a given finite atomic lattice $\hat{L}$. The minimal ideal inherits many nice properties of any ideal $I$ whose lcm-lattice also equals $\hat{L}$, e.g. Cohen-Macaulayness and the dual property of having a linear resolution. Conversely, any ideal having a linear resolution is shown to be (essentially)…
▽ More
A minimal monomial ideal is the combinatorially simplest monomial ideal whose lcm-lattice equals a given finite atomic lattice $\hat{L}$. The minimal ideal inherits many nice properties of any ideal $I$ whose lcm-lattice also equals $\hat{L}$, e.g. Cohen-Macaulayness and the dual property of having a linear resolution. Conversely, any ideal having a linear resolution is shown to be (essentially) minimal.
△ Less
Submitted 3 November, 2005; v1 submitted 2 November, 2005;
originally announced November 2005.