-
Assessing the Usability of GutGPT: A Simulation Study of an AI Clinical Decision Support System for Gastrointestinal Bleeding Risk
Authors:
Colleen Chan,
Kisung You,
Sunny Chung,
Mauro Giuffrè,
Theo Saarinen,
Niroop Rajashekar,
Yuan Pu,
Yeo Eun Shin,
Loren Laine,
Ambrose Wong,
René Kizilcec,
Jasjeet Sekhon,
Dennis Shung
Abstract:
Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electroni…
▽ More
Applications of large language models (LLMs) like ChatGPT have potential to enhance clinical decision support through conversational interfaces. However, challenges of human-algorithmic interaction and clinician trust are poorly understood. GutGPT, a LLM for gastrointestinal (GI) bleeding risk prediction and management guidance, was deployed in clinical simulation scenarios alongside the electronic health record (EHR) with emergency medicine physicians, internal medicine physicians, and medical students to evaluate its effect on physician acceptance and trust in AI clinical decision support systems (AI-CDSS). GutGPT provides risk predictions from a validated machine learning model and evidence-based answers by querying extracted clinical guidelines. Participants were randomized to GutGPT and an interactive dashboard, or the interactive dashboard and a search engine. Surveys and educational assessments taken before and after measured technology acceptance and content mastery. Preliminary results showed mixed effects on acceptance after using GutGPT compared to the dashboard or search engine but appeared to improve content mastery based on simulation performance. Overall, this study demonstrates LLMs like GutGPT could enhance effective AI-CDSS if implemented optimally and paired with interactive interfaces.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Phase Identification of Smart Meters Using a Fourier Series Compression and a Statistical Clustering Algorithm
Authors:
Jeremy J. Chiu,
Albert Wong,
James Park,
Joe Mahony,
Michael Ferri,
Tim Berson
Abstract:
Accurate labeling of phase connectivity in electrical distribution systems is important for maintenance and operations but is often erroneous or missing. In this paper, we present a process to identify which smart meters must be in the same phase using a hierarchical clustering method on voltage time series data. Instead of working with the time series data directly, we apply the Fourier transform…
▽ More
Accurate labeling of phase connectivity in electrical distribution systems is important for maintenance and operations but is often erroneous or missing. In this paper, we present a process to identify which smart meters must be in the same phase using a hierarchical clustering method on voltage time series data. Instead of working with the time series data directly, we apply the Fourier transform to represent the data in their frequency domain, remove $98\%$ of the Fourier coefficients, and use the remaining coefficients to cluster the meters are in the same phase. Result of this process is validated by confirming that cluster (phase) membership of meters does not change over two monthly periods. In addition, we also confirm that meters that belong to the same feeder within the distribution network are correctly classified into the same cluster, that is, assigned to the same phase.
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
-
Evaluating the Impact of State-Level Public Masking Mandates on New COVID-19 Cases and Deaths in the United States: A Demonstration of the Causal Roadmap
Authors:
Angus K. Wong,
Laura B. Balzer
Abstract:
At a national-level, we sought to investigate the effect of public masking mandates on COVID-19 in Fall 2020. Specifically, we aimed to evaluate how the relative growth of COVID-19 cases and deaths would have differed if all states had issued a mandate to mask in public by September 1, 2020 versus if all states had delayed issuing such a mandate. To do so, we applied the Causal Roadmap, a formal f…
▽ More
At a national-level, we sought to investigate the effect of public masking mandates on COVID-19 in Fall 2020. Specifically, we aimed to evaluate how the relative growth of COVID-19 cases and deaths would have differed if all states had issued a mandate to mask in public by September 1, 2020 versus if all states had delayed issuing such a mandate. To do so, we applied the Causal Roadmap, a formal framework for causal and statistical inference. The outcome was defined as the state-specific relative increase in cumulative cases and in cumulative deaths {21, 30, 45, 60}-days after September 1. Despite the natural experiment in state-level masking policies, the causal effect of interest was not identifiable. Nonetheless, we specified the target statistical parameter as the adjusted rate ratio (aRR): the expected outcome with early implementation divided by the expected outcome with delayed implementation, after adjusting for state-level confounders. To minimize strong estimation assumptions, primary analyses used targeted maximum likelihood estimation (TMLE) with Super Learner. After 60-days and at a national-level, early implementation was associated 9% reduction in new COVID-19 cases (aRR: 0.91; 95%CI: 0.88-0.95) and a 16% reduction in new COVID-19 deaths (aRR: 0.84; 95%CI: 0.76-0.93). Although lack of identifiability prohibited causal interpretations, application of the Causal Roadmap facilitated estimation and inference of statistical associations, providing timely answers to pressing questions in the COVID-19 response.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Residual Error: a New Performance Measure for Adversarial Robustness
Authors:
Hossein Aboutalebi,
Mohammad Javad Shafiee,
Michelle Karg,
Christian Scharfenberger,
Alexander Wong
Abstract:
Despite the significant advances in deep learning over the past decade, a major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This sensitivity to making erroneous predictions in the presence of adversarially perturbed data makes deep neural networks difficult to adopt for certain real-world, mission-critical applications. While muc…
▽ More
Despite the significant advances in deep learning over the past decade, a major challenge that limits the wide-spread adoption of deep learning has been their fragility to adversarial attacks. This sensitivity to making erroneous predictions in the presence of adversarially perturbed data makes deep neural networks difficult to adopt for certain real-world, mission-critical applications. While much of the research focus has revolved around adversarial example creation and adversarial hardening, the area of performance measures for assessing adversarial robustness is not well explored. Motivated by this, this study presents the concept of residual error, a new performance measure for not only assessing the adversarial robustness of a deep neural network at the individual sample level, but also can be used to differentiate between adversarial and non-adversarial examples to facilitate for adversarial example detection. Furthermore, we introduce a hybrid model for approximating the residual error in a tractable manner. Experimental results using the case of image classification demonstrates the effectiveness and efficacy of the proposed residual error metric for assessing several well-known deep neural network architectures. These results thus illustrate that the proposed measure could be a useful tool for not only assessing the robustness of deep neural networks used in mission-critical scenarios, but also in the design of adversarially robust models.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Where Does Trust Break Down? A Quantitative Trust Analysis of Deep Neural Networks via Trust Matrix and Conditional Trust Densities
Authors:
Andrew Hryniowski,
Xiao Yu Wang,
Alexander Wong
Abstract:
The advances and successes in deep learning in recent years have led to considerable efforts and investments into its widespread ubiquitous adoption for a wide variety of applications, ranging from personal assistants and intelligent navigation to search and product recommendation in e-commerce. With this tremendous rise in deep learning adoption comes questions about the trustworthiness of the de…
▽ More
The advances and successes in deep learning in recent years have led to considerable efforts and investments into its widespread ubiquitous adoption for a wide variety of applications, ranging from personal assistants and intelligent navigation to search and product recommendation in e-commerce. With this tremendous rise in deep learning adoption comes questions about the trustworthiness of the deep neural networks that power these applications. Motivated to answer such questions, there has been a very recent interest in trust quantification. In this work, we introduce the concept of trust matrix, a novel trust quantification strategy that leverages the recently introduced question-answer trust metric by Wong et al. to provide deeper, more detailed insights into where trust breaks down for a given deep neural network given a set of questions. More specifically, a trust matrix defines the expected question-answer trust for a given actor-oracle answer scenario, allowing one to quickly spot areas of low trust that needs to be addressed to improve the trustworthiness of a deep neural network. The proposed trust matrix is simple to calculate, humanly interpretable, and to the best of the authors' knowledge is the first to study trust at the actor-oracle answer level. We further extend the concept of trust densities with the notion of conditional trust densities. We experimentally leverage trust matrices to study several well-known deep neural network architectures for image recognition, and further study the trust density and conditional trust densities for an interesting actor-oracle answer scenario. The results illustrate that trust matrices, along with conditional trust densities, can be useful tools in addition to the existing suite of trust quantification metrics for guiding practitioners and regulators in creating and certifying deep learning solutions for trusted operation.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Vulnerability Under Adversarial Machine Learning: Bias or Variance?
Authors:
Hossein Aboutalebi,
Mohammad Javad Shafiee,
Michelle Karg,
Christian Scharfenberger,
Alexander Wong
Abstract:
Prior studies have unveiled the vulnerability of the deep neural networks in the context of adversarial machine learning, leading to great recent attention into this area. One interesting question that has yet to be fully explored is the bias-variance relationship of adversarial machine learning, which can potentially provide deeper insights into this behaviour. The notion of bias and variance is…
▽ More
Prior studies have unveiled the vulnerability of the deep neural networks in the context of adversarial machine learning, leading to great recent attention into this area. One interesting question that has yet to be fully explored is the bias-variance relationship of adversarial machine learning, which can potentially provide deeper insights into this behaviour. The notion of bias and variance is one of the main approaches to analyze and evaluate the generalization and reliability of a machine learning model. Although it has been extensively used in other machine learning models, it is not well explored in the field of deep learning and it is even less explored in the area of adversarial machine learning.
In this study, we investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network and analyze how adversarial perturbations can affect the generalization of a network. We derive the bias-variance trade-off for both classification and regression applications based on two main loss functions: (i) mean squared error (MSE), and (ii) cross-entropy. Furthermore, we perform quantitative analysis with both simulated and real data to empirically evaluate consistency with the derived bias-variance tradeoffs. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation from a bias-variance point of view and how this type of perturbation would change the performance of a network. Moreover, given these new theoretical findings, we introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies (e.g., PGD) while providing a high success rate in fooling deep neural networks in lower perturbation magnitudes.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
Deep Neural Network Perception Models and Robust Autonomous Driving Systems
Authors:
Mohammad Javad Shafiee,
Ahmadreza Jeddi,
Amir Nazemi,
Paul Fieguth,
Alexander Wong
Abstract:
This paper analyzes the robustness of deep learning models in autonomous driving applications and discusses the practical solutions to address that.
This paper analyzes the robustness of deep learning models in autonomous driving applications and discusses the practical solutions to address that.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
Feature engineering workflow for activity recognition from synchronized inertial measurement units
Authors:
Andreas W. Kempa-Liehr,
Jonty Oram,
Andrew Wong,
Mark Finch,
Thor Besier
Abstract:
The ubiquitous availability of wearable sensors is responsible for driving the Internet-of-Things but is also making an impact on sport sciences and precision medicine. While human activity recognition from smartphone data or other types of inertial measurement units (IMU) has evolved to one of the most prominent daily life examples of machine learning, the underlying process of time-series featur…
▽ More
The ubiquitous availability of wearable sensors is responsible for driving the Internet-of-Things but is also making an impact on sport sciences and precision medicine. While human activity recognition from smartphone data or other types of inertial measurement units (IMU) has evolved to one of the most prominent daily life examples of machine learning, the underlying process of time-series feature engineering still seems to be time-consuming. This lengthy process inhibits the development of IMU-based machine learning applications in sport science and precision medicine. This contribution discusses a feature engineering workflow, which automates the extraction of time-series feature on based on the FRESH algorithm (FeatuRe Extraction based on Scalable Hypothesis tests) to identify statistically significant features from synchronized IMU sensors (IMeasureU Ltd, NZ). The feature engineering workflow has five main steps: time-series engineering, automated time-series feature extraction, optimized feature extraction, fitting of a specialized classifier, and deployment of optimized machine learning pipeline. The workflow is discussed for the case of a user-specific running-walking classification, and the generalization to a multi-user multi-activity classification is demonstrated.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
Unsupervised Depth Completion from Visual Inertial Odometry
Authors:
Alex Wong,
Xiaohan Fei,
Stephanie Tsuei,
Stefano Soatto
Abstract:
We describe a method to infer dense depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it t…
▽ More
We describe a method to infer dense depth from camera motion and sparse depth as estimated using a visual-inertial odometry system. Unlike other scenarios using point clouds from lidar or structured light sensors, we have few hundreds to few thousand points, insufficient to inform the topology of the scene. Our method first constructs a piecewise planar scaffolding of the scene, and then uses it to infer dense depth using the image along with the sparse points. We use a predictive cross-modal criterion, akin to `self-supervision,' measuring photometric consistency across time, forward-backward pose consistency, and geometric compatibility with the sparse point cloud. We also launch the first visual-inertial + depth dataset, which we hope will foster additional exploration into combining the complementary strengths of visual and inertial sensors. To compare our method to prior work, we adopt the unsupervised KITTI depth completion benchmark, and show state-of-the-art performance on it. Code available at: https://github.com/alexklwong/unsupervised-depth-completion-visual-inertial-odometry.
△ Less
Submitted 21 July, 2021; v1 submitted 14 May, 2019;
originally announced May 2019.
-
Beyond Explainability: Leveraging Interpretability for Improved Adversarial Learning
Authors:
Devinder Kumar,
Ibrahim Ben-Daya,
Kanav Vats,
Jeffery Feng,
Graham Taylor and,
Alexander Wong
Abstract:
In this study, we propose the leveraging of interpretability for tasks beyond purely the purpose of explainability. In particular, this study puts forward a novel strategy for leveraging gradient-based interpretability in the realm of adversarial examples, where we use insights gained to aid adversarial learning. More specifically, we introduce the concept of spatially constrained one-pixel advers…
▽ More
In this study, we propose the leveraging of interpretability for tasks beyond purely the purpose of explainability. In particular, this study puts forward a novel strategy for leveraging gradient-based interpretability in the realm of adversarial examples, where we use insights gained to aid adversarial learning. More specifically, we introduce the concept of spatially constrained one-pixel adversarial perturbations, where we guide the learning of such adversarial perturbations towards more susceptible areas identified via gradient-based interpretability. Experimental results using different benchmark datasets show that such a spatially constrained one-pixel adversarial perturbation strategy can noticeably improve the speed of convergence as well as produce successful attacks that were also visually difficult to perceive, thus illustrating an effective use of interpretability methods for tasks outside of the purpose of purely explainability.
△ Less
Submitted 21 April, 2019;
originally announced April 2019.
-
Precision Annealing Monte Carlo Methods for Statistical Data Assimilation: Metropolis-Hastings Procedures
Authors:
Adrian S. Wong,
Kangbo Hao,
Zheng Fang,
Henry D. I. Abarbanel
Abstract:
Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of inte…
▽ More
Statistical Data Assimilation (SDA) is the transfer of information from field or laboratory observations to a user selected model of the dynamical system producing those observations. The data is noisy and the model has errors; the information transfer addresses properties of the conditional probability distribution of the states of the model conditioned on the observations. The quantities of interest in SDA are the conditional expected values of functions of the model state, and these require the approximate evaluation of high dimensional integrals. We introduce a conditional probability distribution and use the Laplace method with annealing to identify the maxima of the conditional probability distribution. The annealing method slowly increases the precision term of the model as it enters the Laplace method. In this paper, we extend the idea of precision annealing (PA) to Monte Carlo calculations of conditional expected values using Metropolis-Hastings methods.
△ Less
Submitted 14 January, 2019;
originally announced January 2019.
-
SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections
Authors:
Amir-Hossein Karimi,
Alexander Wong,
Ali Ghodsi
Abstract:
Supervised dimensionality reduction strategies have been of great interest. However, current supervised dimensionality reduction approaches are difficult to scale for situations characterized by large datasets given the high computational complexities associated with such methods. While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this…
▽ More
Supervised dimensionality reduction strategies have been of great interest. However, current supervised dimensionality reduction approaches are difficult to scale for situations characterized by large datasets given the high computational complexities associated with such methods. While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this challenge, such approaches are not well-suited for accelerating computational speed for supervised dimensionality reduction. Motivated to tackle this challenge, in this study we explore a novel direction of directly learning optimal class-aware embeddings in a supervised manner via the notion of supervised random projections (SRP). The key idea behind SRP is that, rather than performing spectral decomposition (or approximations thereof) which are computationally prohibitive for large-scale data, we instead perform a direct decomposition by leveraging kernel approximation theory and the symmetry of the Hilbert-Schmidt Independence Criterion (HSIC) measure of dependence between the embedded data and the labels. Experimental results on five different synthetic and real-world datasets demonstrate that the proposed SRP strategy for class-aware embedding learning can be very promising in producing embeddings that are highly competitive with existing supervised dimensionality reduction methods (e.g., SPCA and KSPCA) while achieving 1-2 orders of magnitude better computational performance. As such, such an efficient approach to learning embeddings for dimensionality reduction can be a powerful tool for large-scale data analysis and visualization.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
EdgeSpeechNets: Highly Efficient Deep Neural Networks for Speech Recognition on the Edge
Authors:
Zhong Qiu Lin,
Audrey G. Chung,
Alexander Wong
Abstract:
Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting ef…
▽ More
Despite showing state-of-the-art performance, deep learning for speech recognition remains challenging to deploy in on-device edge scenarios such as mobile and other consumer devices. Recently, there have been greater efforts in the design of small, low-footprint deep neural networks (DNNs) that are more appropriate for edge devices, with much of the focus on design principles for hand-crafting efficient network architectures. In this study, we explore a human-machine collaborative design strategy for building low-footprint DNN architectures for speech recognition through a marriage of human-driven principled network design prototy** and machine-driven design exploration. The efficacy of this design strategy is demonstrated through the design of a family of highly-efficient DNNs (nicknamed EdgeSpeechNets) for limited-vocabulary speech recognition. Experimental results using the Google Speech Commands dataset for limited-vocabulary speech recognition showed that EdgeSpeechNets have higher accuracies than state-of-the-art DNNs (with the best EdgeSpeechNet achieving ~97% accuracy), while achieving significantly smaller network sizes (as much as 7.8x smaller) and lower computational cost (as much as 36x fewer multiply-add operations, 10x lower prediction latency, and 16x smaller memory footprint on a Motorola Moto E phone), making them very well-suited for on-device edge voice interface applications.
△ Less
Submitted 13 November, 2018; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Aligning Manifolds of Double Pendulum Dynamics Under the Influence of Noise
Authors:
Fayeem Aziz,
Aaron S. W. Wong,
James S. Welsh,
Stephan K. Chalup
Abstract:
This study presents the results of a series of simulation experiments that evaluate and compare four different manifold alignment methods under the influence of noise. The data was created by simulating the dynamics of two slightly different double pendulums in three-dimensional space. The method of semi-supervised feature-level manifold alignment using global distance resulted in the most convinc…
▽ More
This study presents the results of a series of simulation experiments that evaluate and compare four different manifold alignment methods under the influence of noise. The data was created by simulating the dynamics of two slightly different double pendulums in three-dimensional space. The method of semi-supervised feature-level manifold alignment using global distance resulted in the most convincing visualisations. However, the semi-supervised feature-level local alignment methods resulted in smaller alignment errors. These local alignment methods were also more robust to noise and faster than the other methods.
△ Less
Submitted 20 September, 2018; v1 submitted 18 September, 2018;
originally announced September 2018.
-
NetScore: Towards Universal Metrics for Large-scale Performance Analysis of Deep Neural Networks for Practical On-Device Edge Usage
Authors:
Alexander Wong
Abstract:
Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures that are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices given their high computational and memory requirements. As a result, there has been a recent interest in the design of qua…
▽ More
Much of the focus in the design of deep neural networks has been on improving accuracy, leading to more powerful yet highly complex network architectures that are difficult to deploy in practical scenarios, particularly on edge devices such as mobile and other consumer devices given their high computational and memory requirements. As a result, there has been a recent interest in the design of quantitative metrics for evaluating deep neural networks that accounts for more than just model accuracy as the sole indicator of network performance. In this study, we continue the conversation towards universal metrics for evaluating the performance of deep neural networks for practical on-device edge usage. In particular, we propose a new balanced metric called NetScore, which is designed specifically to provide a quantitative assessment of the balance between accuracy, computational complexity, and network architecture complexity of a deep neural network, which is important for on-device edge operation. In what is one of the largest comparative analysis between deep neural networks in literature, the NetScore metric, the top-1 accuracy metric, and the popular information density metric were compared across a diverse set of 60 different deep convolutional neural networks for image classification on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012) dataset. The evaluation results across these three metrics for this diverse set of networks are presented in this study to act as a reference guide for practitioners in the field. The proposed NetScore metric, along with the other tested metrics, are by no means perfect, but the hope is to push the conversation towards better universal metrics for evaluating deep neural networks for use in practical on-device edge scenarios to help guide practitioners in model design for such scenarios.
△ Less
Submitted 25 August, 2018; v1 submitted 14 June, 2018;
originally announced June 2018.
-
Deep Nearest Class Mean Model for Incremental Odor Classification
Authors:
Yu Cheng,
Angus Wong,
Kevin Hung,
Zhizhong Li,
Weitong Li,
Jun Zhang
Abstract:
In recent years, more machine learning algorithms have been applied to odor classification. These odor classification algorithms usually assume that the training datasets are static. However, for some odor recognition tasks, new odor classes continually emerge. That is, the odor datasets are dynamically growing while both training samples and number of classes are increasing over time. Motivated b…
▽ More
In recent years, more machine learning algorithms have been applied to odor classification. These odor classification algorithms usually assume that the training datasets are static. However, for some odor recognition tasks, new odor classes continually emerge. That is, the odor datasets are dynamically growing while both training samples and number of classes are increasing over time. Motivated by this concern, this paper proposes a Deep Nearest Class Mean (DNCM) model based on the deep learning framework and nearest class mean method. The proposed model not only leverages deep neural network to extract deep features, but is also able to dynamically integrate new classes over time. In our experiments, the DNCM model was initially trained with 10 classes, then 25 new classes are integrated. Experiment results demonstrate that the proposed model is very efficient for incremental odor classification, especially for new classes with only a small number of training examples.
△ Less
Submitted 27 April, 2019; v1 submitted 8 January, 2018;
originally announced January 2018.
-
JADE: Joint Autoencoders for Dis-Entanglement
Authors:
Ershad Banijamali,
Amir-Hossein Karimi,
Alexander Wong,
Ali Ghodsi
Abstract:
The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of…
▽ More
The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis. State-of-the-art methods for disentangling feature representations rely on the presence of many labeled samples. In this work, we present a novel method for disentangling factors of variation in data-scarce regimes. Specifically, we explore the application of feature disentangling for the problem of supervised classification in a setting where few labeled samples exist, and there are no unlabeled samples for use in unsupervised training. Instead, a similar datasets exists which shares at least one direction of variation with the sample-constrained datasets. We train our model end-to-end using the framework of variational autoencoders and are able to experimentally demonstrate that using an auxiliary dataset with similar variation factors contribute positively to classification performance, yielding competitive results with the state-of-the-art in unsupervised learning.
△ Less
Submitted 24 November, 2017;
originally announced November 2017.
-
Synthesizing Deep Neural Network Architectures using Biological Synaptic Strength Distributions
Authors:
A. H. Karimi,
M. J. Shafiee,
A. Ghodsi,
A. Wong
Abstract:
In this work, we perform an exploratory study on synthesizing deep neural networks using biological synaptic strength distributions, and the potential influence of different distributions on modelling performance particularly for the scenario associated with small data sets. Surprisingly, a CNN with convolutional layer synaptic strengths drawn from biologically-inspired distributions such as log-n…
▽ More
In this work, we perform an exploratory study on synthesizing deep neural networks using biological synaptic strength distributions, and the potential influence of different distributions on modelling performance particularly for the scenario associated with small data sets. Surprisingly, a CNN with convolutional layer synaptic strengths drawn from biologically-inspired distributions such as log-normal or correlated center-surround distributions performed relatively well suggesting a possibility for designing deep neural network architectures that do not require many data samples to learn, and can sidestep current training procedures while maintaining or boosting modelling performance.
△ Less
Submitted 30 June, 2017;
originally announced July 2017.
-
Evolution in Groups: A deeper look at synaptic cluster driven evolution of deep neural networks
Authors:
Mohammad Javad Shafiee,
Elnaz Barshan,
Alexander Wong
Abstract:
A promising paradigm for achieving highly efficient deep neural networks is the idea of evolutionary deep intelligence, which mimics biological evolution processes to progressively synthesize more efficient networks. A crucial design factor in evolutionary deep intelligence is the genetic encoding scheme used to simulate heredity and determine the architectures of offspring networks. In this study…
▽ More
A promising paradigm for achieving highly efficient deep neural networks is the idea of evolutionary deep intelligence, which mimics biological evolution processes to progressively synthesize more efficient networks. A crucial design factor in evolutionary deep intelligence is the genetic encoding scheme used to simulate heredity and determine the architectures of offspring networks. In this study, we take a deeper look at the notion of synaptic cluster-driven evolution of deep neural networks which guides the evolution process towards the formation of a highly sparse set of synaptic clusters in offspring networks. Utilizing a synaptic cluster-driven genetic encoding, the probabilistic encoding of synaptic traits considers not only individual synaptic properties but also inter-synaptic relationships within a deep neural network. This process results in highly sparse offspring networks which are particularly tailored for parallel computational devices such as GPUs and deep neural network accelerator chips. Comprehensive experimental results using four well-known deep neural network architectures (LeNet-5, AlexNet, ResNet-56, and DetectNet) on two different tasks (object categorization and object detection) demonstrate the efficiency of the proposed method. Cluster-driven genetic encoding scheme synthesizes networks that can achieve state-of-the-art performance with significantly smaller number of synapses than that of the original ancestor network. ($\sim$125-fold decrease in synapses for MNIST). Furthermore, the improved cluster efficiency in the generated offspring networks ($\sim$9.71-fold decrease in clusters for MNIST and a $\sim$8.16-fold decrease in clusters for KITTI) is particularly useful for accelerated performance on parallel computing hardware architectures such as those in GPUs and deep neural network accelerator chips.
△ Less
Submitted 6 April, 2017;
originally announced April 2017.
-
Evolutionary Synthesis of Deep Neural Networks via Synaptic Cluster-driven Genetic Encoding
Authors:
Mohammad Javad Shafiee,
Alexander Wong
Abstract:
There has been significant recent interest towards achieving highly efficient deep neural network architectures. A promising paradigm for achieving this is the concept of evolutionary deep intelligence, which attempts to mimic biological evolution processes to synthesize highly-efficient deep neural networks over successive generations. An important aspect of evolutionary deep intelligence is the…
▽ More
There has been significant recent interest towards achieving highly efficient deep neural network architectures. A promising paradigm for achieving this is the concept of evolutionary deep intelligence, which attempts to mimic biological evolution processes to synthesize highly-efficient deep neural networks over successive generations. An important aspect of evolutionary deep intelligence is the genetic encoding scheme used to mimic heredity, which can have a significant impact on the quality of offspring deep neural networks. Motivated by the neurobiological phenomenon of synaptic clustering, we introduce a new genetic encoding scheme where synaptic probability is driven towards the formation of a highly sparse set of synaptic clusters. Experimental results for the task of image classification demonstrated that the synthesized offspring networks using this synaptic cluster-driven genetic encoding scheme can achieve state-of-the-art performance while having network architectures that are not only significantly more efficient (with a ~125-fold decrease in synapses for MNIST) compared to the original ancestor network, but also tailored for GPU-accelerated machine learning applications.
△ Less
Submitted 22 November, 2016; v1 submitted 5 September, 2016;
originally announced September 2016.
-
Deep Learning with Darwin: Evolutionary Synthesis of Deep Neural Networks
Authors:
Mohammad Javad Shafiee,
Akshaya Mishra,
Alexander Wong
Abstract:
Taking inspiration from biological evolution, we explore the idea of "Can deep neural networks evolve naturally over successive generations into highly efficient deep neural networks?" by introducing the notion of synthesizing new highly efficient, yet powerful deep neural networks over successive generations via an evolutionary process from ancestor deep neural networks. The architectural traits…
▽ More
Taking inspiration from biological evolution, we explore the idea of "Can deep neural networks evolve naturally over successive generations into highly efficient deep neural networks?" by introducing the notion of synthesizing new highly efficient, yet powerful deep neural networks over successive generations via an evolutionary process from ancestor deep neural networks. The architectural traits of ancestor deep neural networks are encoded using synaptic probability models, which can be viewed as the `DNA' of these networks. New descendant networks with differing network architectures are synthesized based on these synaptic probability models from the ancestor networks and computational environmental factor models, in a random manner to mimic heredity, natural selection, and random mutation. These offspring networks are then trained into fully functional networks, like one would train a newborn, and have more efficient, more diverse network architectures than their ancestor networks, while achieving powerful modeling capabilities. Experimental results for the task of visual saliency demonstrated that the synthesized `evolved' offspring networks can achieve state-of-the-art performance while having network architectures that are significantly more efficient (with a staggering $\sim$48-fold decrease in synapses by the fourth generation) compared to the original ancestor network.
△ Less
Submitted 6 February, 2017; v1 submitted 14 June, 2016;
originally announced June 2016.
-
Random Feature Maps via a Layered Random Projection (LaRP) Framework for Object Classification
Authors:
A. G. Chung,
M. J. Shafiee,
A. Wong
Abstract:
The approximation of nonlinear kernels via linear feature maps has recently gained interest due to their applications in reducing the training and testing time of kernel-based learning algorithms. Current random projection methods avoid the curse of dimensionality by embedding the nonlinear feature space into a low dimensional Euclidean space to create nonlinear kernels. We introduce a Layered Ran…
▽ More
The approximation of nonlinear kernels via linear feature maps has recently gained interest due to their applications in reducing the training and testing time of kernel-based learning algorithms. Current random projection methods avoid the curse of dimensionality by embedding the nonlinear feature space into a low dimensional Euclidean space to create nonlinear kernels. We introduce a Layered Random Projection (LaRP) framework, where we model the linear kernels and nonlinearity separately for increased training efficiency. The proposed LaRP framework was assessed using the MNIST hand-written digits database and the COIL-100 object database, and showed notable improvement in object classification performance relative to other state-of-the-art random projection methods.
△ Less
Submitted 4 February, 2016;
originally announced February 2016.
-
Sparse Reconstruction of Compressive Sensing MRI using Cross-Domain Stochastically Fully Connected Conditional Random Fields
Authors:
Edward Li,
Farzad Khalvati,
Mohammad Javad Shafiee,
Masoom A. Haider,
Alexander Wong
Abstract:
Magnetic Resonance Imaging (MRI) is a crucial medical imaging technology for the screening and diagnosis of frequently occurring cancers. However image quality may suffer by long acquisition times for MRIs due to patient motion, as well as result in great patient discomfort. Reducing MRI acquisition time can reduce patient discomfort and as a result reduces motion artifacts from the acquisition pr…
▽ More
Magnetic Resonance Imaging (MRI) is a crucial medical imaging technology for the screening and diagnosis of frequently occurring cancers. However image quality may suffer by long acquisition times for MRIs due to patient motion, as well as result in great patient discomfort. Reducing MRI acquisition time can reduce patient discomfort and as a result reduces motion artifacts from the acquisition process. Compressive sensing strategies, when applied to MRI, have been demonstrated to be effective at decreasing acquisition times significantly by sparsely sampling the \emph{k}-space during the acquisition process. However, such a strategy requires advanced reconstruction algorithms to produce high quality and reliable images from compressive sensing MRI. This paper proposes a new reconstruction approach based on cross-domain stochastically fully connected conditional random fields (CD-SFCRF) for compressive sensing MRI. The CD-SFCRF introduces constraints in both \emph{k}-space and spatial domains within a stochastically fully connected graphical model to produce improved MRI reconstruction. Experimental results using T2-weighted (T2w) imaging and diffusion-weighted imaging (DWI) of the prostate show strong performance in preserving fine details and tissue structures in the reconstructed images when compared to other tested methods even at low sampling rates.
△ Less
Submitted 24 December, 2015;
originally announced December 2015.
-
Domain Adaptation and Transfer Learning in StochasticNets
Authors:
Mohammad Javad Shafiee,
Parthipan Siva,
Paul Fieguth,
Alexander Wong
Abstract:
Transfer learning is a recent field of machine learning research that aims to resolve the challenge of dealing with insufficient training data in the domain of interest. This is a particular issue with traditional deep neural networks where a large amount of training data is needed. Recently, StochasticNets was proposed to take advantage of sparse connectivity in order to decrease the number of pa…
▽ More
Transfer learning is a recent field of machine learning research that aims to resolve the challenge of dealing with insufficient training data in the domain of interest. This is a particular issue with traditional deep neural networks where a large amount of training data is needed. Recently, StochasticNets was proposed to take advantage of sparse connectivity in order to decrease the number of parameters that needs to be learned, which in turn may relax training data size requirements. In this paper, we study the efficacy of transfer learning on StochasticNet frameworks. Experimental results show ~7% improvement on StochasticNet performance when the transfer learning is applied in training step.
△ Less
Submitted 17 December, 2015;
originally announced December 2015.
-
Noise-Compensated, Bias-Corrected Diffusion Weighted Endorectal Magnetic Resonance Imaging via a Stochastically Fully-Connected Joint Conditional Random Field Model
Authors:
Ameneh Boroomand,
Mohammad Javad Shafiee,
Farzad Khalvati,
Masoom A. Haider,
Alexander Wong
Abstract:
Diffusion weighted magnetic resonance imaging (DW-MR) is a powerful tool in imaging-based prostate cancer screening and detection. Endorectal coils are commonly used in DW-MR imaging to improve the signal-to-noise ratio (SNR) of the acquisition, at the expense of significant intensity inhomogeneities (bias field) that worsens as we move away from the endorectal coil. The presence of bias field can…
▽ More
Diffusion weighted magnetic resonance imaging (DW-MR) is a powerful tool in imaging-based prostate cancer screening and detection. Endorectal coils are commonly used in DW-MR imaging to improve the signal-to-noise ratio (SNR) of the acquisition, at the expense of significant intensity inhomogeneities (bias field) that worsens as we move away from the endorectal coil. The presence of bias field can have a significant negative impact on the accuracy of different image analysis tasks, as well as prostate tumor localization, thus leading to increased inter- and intra-observer variability. Retrospective bias correction approaches are introduced as a more efficient way of bias correction compared to the prospective methods such that they correct for both of the scanner and anatomy-related bias fields in MR imaging. Previously proposed retrospective bias field correction methods suffer from undesired noise amplification that can reduce the quality of bias-corrected DW-MR image. Here, we propose a unified data reconstruction approach that enables joint compensation of bias field as well as data noise in DW-MR imaging. The proposed noise-compensated, bias-corrected (NCBC) data reconstruction method takes advantage of a novel stochastically fully connected joint conditional random field (SFC-JCRF) model to mitigate the effects of data noise and bias field in the reconstructed MR data. The proposed NCBC reconstruction method was tested on synthetic DW-MR data, physical DW-phantom as well as real DW-MR data all acquired using endorectal MR coil. Both qualitative and quantitative analysis illustrated that the proposed NCBC method can achieve improved image quality when compared to other tested bias correction methods. As such, the proposed NCBC method may have potential as a useful retrospective approach for improving the consistency of image interpretations.
△ Less
Submitted 5 July, 2016; v1 submitted 14 December, 2015;
originally announced December 2015.
-
Efficient Deep Feature Learning and Extraction via StochasticNets
Authors:
Mohammad Javad Shafiee,
Parthipan Siva,
Paul Fieguth,
Alexander Wong
Abstract:
Deep neural networks are a powerful tool for feature learning and extraction given their ability to model high-level abstractions in highly complex data. One area worth exploring in feature learning and extraction using deep neural networks is efficient neural connectivity formation for faster feature learning and extraction. Motivated by findings of stochastic synaptic connectivity formation in t…
▽ More
Deep neural networks are a powerful tool for feature learning and extraction given their ability to model high-level abstractions in highly complex data. One area worth exploring in feature learning and extraction using deep neural networks is efficient neural connectivity formation for faster feature learning and extraction. Motivated by findings of stochastic synaptic connectivity formation in the brain as well as the brain's uncanny ability to efficiently represent information, we propose the efficient learning and extraction of features via StochasticNets, where sparsely-connected deep neural networks can be formed via stochastic connectivity between neurons. To evaluate the feasibility of such a deep neural network architecture for feature learning and extraction, we train deep convolutional StochasticNets to learn abstract features using the CIFAR-10 dataset, and extract the learned features from images to perform classification on the SVHN and STL-10 datasets. Experimental results show that features learned using deep convolutional StochasticNets, with fewer neural connections than conventional deep convolutional neural networks, can allow for better or comparable classification accuracy than conventional deep neural networks: relative test error decrease of ~4.5% for classification on the STL-10 dataset and ~1% for classification on the SVHN dataset. Furthermore, it was shown that the deep features extracted using deep convolutional StochasticNets can provide comparable classification accuracy even when only 10% of the training data is used for feature learning. Finally, it was also shown that significant gains in feature extraction speed can be achieved in embedded applications using StochasticNets. As such, StochasticNets allow for faster feature learning and extraction performance while facilitate for better or comparable accuracy performances.
△ Less
Submitted 11 December, 2015;
originally announced December 2015.
-
Monte Carlo-based Noise Compensation in Coil Intensity Corrected Endorectal MRI
Authors:
Dorothy Lui,
Amen Modhafar,
Masoom Haider,
Alexander Wong
Abstract:
Background: Prostate cancer is one of the most common forms of cancer found in males making early diagnosis important. Magnetic resonance imaging (MRI) has been useful in visualizing and localizing tumor candidates and with the use of endorectal coils (ERC), the signal-to-noise ratio (SNR) can be improved. The coils introduce intensity inhomogeneities and the surface coil intensity correction buil…
▽ More
Background: Prostate cancer is one of the most common forms of cancer found in males making early diagnosis important. Magnetic resonance imaging (MRI) has been useful in visualizing and localizing tumor candidates and with the use of endorectal coils (ERC), the signal-to-noise ratio (SNR) can be improved. The coils introduce intensity inhomogeneities and the surface coil intensity correction built into MRI scanners is used to reduce these inhomogeneities. However, the correction typically performed at the MRI scanner level leads to noise amplification and noise level variations. Methods: In this study, we introduce a new Monte Carlo-based noise compensation approach for coil intensity corrected endorectal MRI which allows for effective noise compensation and preservation of details within the prostate. The approach accounts for the ERC SNR profile via a spatially-adaptive noise model for correcting non-stationary noise variations. Such a method is useful particularly for improving the image quality of coil intensity corrected endorectal MRI data performed at the MRI scanner level and when the original raw data is not available. Results: SNR and contrast-to-noise ratio (CNR) analysis in patient experiments demonstrate an average improvement of 11.7 dB and 11.2 dB respectively over uncorrected endorectal MRI, and provides strong performance when compared to existing approaches. Conclusions: A new noise compensation method was developed for the purpose of improving the quality of coil intensity corrected endorectal MRI data performed at the MRI scanner level. We illustrate that promising noise compensation performance can be achieved for the proposed approach, which is particularly important for processing coil intensity corrected endorectal MRI data performed at the MRI scanner level and when the original raw data is not available.
△ Less
Submitted 24 July, 2015;
originally announced July 2015.
-
Bayesian-based deconvolution fluorescence microscopy using dynamically updated nonparametric nonstationary expectation estimates
Authors:
Alexander Wong,
Xiao Yu Wang,
Maud Gorbet
Abstract:
Fluorescence microscopy is widely used for the study of biological specimens. Deconvolution can significantly improve the resolution and contrast of images produced using fluorescence microscopy; in particular, Bayesian-based methods have become very popular in deconvolution fluorescence microscopy. An ongoing challenge with Bayesian-based methods is in dealing with the presence of noise in low SN…
▽ More
Fluorescence microscopy is widely used for the study of biological specimens. Deconvolution can significantly improve the resolution and contrast of images produced using fluorescence microscopy; in particular, Bayesian-based methods have become very popular in deconvolution fluorescence microscopy. An ongoing challenge with Bayesian-based methods is in dealing with the presence of noise in low SNR imaging conditions. In this study, we present a Bayesian-based method for performing deconvolution using dynamically updated nonparametric nonstationary expectation estimates that can improve the fluorescence microscopy image quality in the presence of noise, without explicit use of spatial regularization.
△ Less
Submitted 3 February, 2015;
originally announced February 2015.
-
A Deep-structured Conditional Random Field Model for Object Silhouette Tracking
Authors:
Mohammad Shafiee,
Zohreh Azimifar,
Alexander Wong
Abstract:
In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity d…
▽ More
In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity dynamically determined based on inter-frame optical flow. By incorporate both spatial and temporal context in a dynamic fashion within such a deep-structured probabilistic graphical model, the proposed DS-CRF model allows us to develop a framework that can accurately and efficiently track object silhouettes that can change greatly over time, as well as under different situations such as occlusion and multiple targets within the scene. Experiment results using video surveillance datasets containing different scenarios such as occlusion and multiple targets showed that the proposed DS-CRF approach provides strong object silhouette tracking performance when compared to baseline methods such as mean-shift tracking, as well as state-of-the-art methods such as context tracking and boosted particle filtering.
△ Less
Submitted 4 August, 2015; v1 submitted 4 January, 2015;
originally announced January 2015.
-
A deep-structured fully-connected random field model for structured inference
Authors:
Alexander Wong,
Mohammad Javad Shafiee,
Parthipan Siva,
Xiao Yu Wang
Abstract:
There has been significant interest in the use of fully-connected graphical models and deep-structured graphical models for the purpose of structured inference. However, fully-connected and deep-structured graphical models have been largely explored independently, leaving the unification of these two concepts ripe for exploration. A fundamental challenge with unifying these two types of models is…
▽ More
There has been significant interest in the use of fully-connected graphical models and deep-structured graphical models for the purpose of structured inference. However, fully-connected and deep-structured graphical models have been largely explored independently, leaving the unification of these two concepts ripe for exploration. A fundamental challenge with unifying these two types of models is in dealing with computational complexity. In this study, we investigate the feasibility of unifying fully-connected and deep-structured models in a computationally tractable manner for the purpose of structured inference. To accomplish this, we introduce a deep-structured fully-connected random field (DFRF) model that integrates a series of intermediate sparse auto-encoding layers placed between state layers to significantly reduce computational complexity. The problem of image segmentation was used to illustrate the feasibility of using the DFRF for structured inference in a computationally tractable manner. Results in this study show that it is feasible to unify fully-connected and deep-structured models in a computationally tractable manner for solving structured inference problems such as image segmentation.
△ Less
Submitted 27 May, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
A Bayesian Residual Transform for Signal Processing
Authors:
Alexander Wong,
Xiao Yu Wang
Abstract:
Multi-scale decomposition has been an invaluable tool for the processing of physiological signals. Much focus in multi-scale decomposition for processing such signals have been based on scale-space theory and wavelet transforms. In this study, we take a different perspective on multi-scale decomposition by investigating the feasibility of utilizing a Bayesian-based method for multi-scale signal de…
▽ More
Multi-scale decomposition has been an invaluable tool for the processing of physiological signals. Much focus in multi-scale decomposition for processing such signals have been based on scale-space theory and wavelet transforms. In this study, we take a different perspective on multi-scale decomposition by investigating the feasibility of utilizing a Bayesian-based method for multi-scale signal decomposition called Bayesian Residual Transform (BRT) for the purpose of physiological signal processing. In BRT, a signal is modeled as the summation of residual signals, each characterizing information from the signal at different scales. A deep cascading framework is introduced as a realization of the BRT. Signal-to-noise ratio (SNR) analysis using electrocardiography (ECG) signals was used to illustrate the feasibility of using the BRT for suppressing noise in physiological signals. Results in this study show that it is feasible to utilize the BRT for processing physiological signals for tasks such as noise suppression.
△ Less
Submitted 2 June, 2015; v1 submitted 2 October, 2014;
originally announced October 2014.
-
Higher Accuracy for Bayesian and Frequentist Inference: Large Sample Theory for Small Sample Likelihood
Authors:
M. Bédard,
D. A. S. Fraser,
A. Wong
Abstract:
Recent likelihood theory produces $p$-values that have remarkable accuracy and wide applicability. The calculations use familiar tools such as maximum likelihood values (MLEs), observed information and parameter rescaling. The usual evaluation of such $p$-values is by simulations, and such simulations do verify that the global distribution of the $p$-values is uniform(0, 1), to high accuracy in…
▽ More
Recent likelihood theory produces $p$-values that have remarkable accuracy and wide applicability. The calculations use familiar tools such as maximum likelihood values (MLEs), observed information and parameter rescaling. The usual evaluation of such $p$-values is by simulations, and such simulations do verify that the global distribution of the $p$-values is uniform(0, 1), to high accuracy in repeated sampling. The derivation of the $p$-values, however, asserts a stronger statement, that they have a uniform(0, 1) distribution conditionally, given identified precision information provided by the data. We take a simple regression example that involves exact precision information and use large sample techniques to extract highly accurate information as to the statistical position of the data point with respect to the parameter: specifically, we examine various $p$-values and Bayesian posterior survivor $s$-values for validity. With observed data we numerically evaluate the various $p$-values and $s$-values, and we also record the related general formulas. We then assess the numerical values for accuracy using Markov chain Monte Carlo (McMC) methods. We also propose some third-order likelihood-based procedures for obtaining means and variances of Bayesian posterior distributions, again followed by McMC assessment. Finally we propose some adaptive McMC methods to improve the simulation acceptance rates. All these methods are based on asymptotic analysis that derives from the effect of additional data. And the methods use simple calculations based on familiar maximizing values and related informations. The example illustrates the general formulas and the ease of calculations, while the McMC assessments demonstrate the numerical validity of the $p$-values as percentage position of a data point. The example, however, is very simple and transparent, and thus gives little indication that in a wide generality of models the formulas do accurately separate information for almost any parameter of interest, and then do give accurate $p$-value determinations from that information. As illustration an enigmatic problem in the literature is discussed and simulations are recorded; various examples in the literature are cited.
△ Less
Submitted 24 January, 2008;
originally announced January 2008.