-
DL-KDD: Dual-Light Knowledge Distillation for Action Recognition in the Dark
Authors:
Chi-Jui Chang,
Oscar Tai-Yuan Chen,
Vincent S. Tseng
Abstract:
Human action recognition in dark videos is a challenging task for computer vision. Recent research focuses on applying dark enhancement methods to improve the visibility of the video. However, such video processing results in the loss of critical information in the original (un-enhanced) video. Conversely, traditional two-stream methods are capable of learning information from both original and pr…
▽ More
Human action recognition in dark videos is a challenging task for computer vision. Recent research focuses on applying dark enhancement methods to improve the visibility of the video. However, such video processing results in the loss of critical information in the original (un-enhanced) video. Conversely, traditional two-stream methods are capable of learning information from both original and processed videos, but it can lead to a significant increase in the computational cost during the inference phase in the task of video classification. To address these challenges, we propose a novel teacher-student video classification framework, named Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD). This framework enables the model to learn from both original and enhanced video without introducing additional computational cost during inference. Specifically, DL-KDD utilizes the strategy of knowledge distillation during training. The teacher model is trained with enhanced video, and the student model is trained with both the original video and the soft target generated by the teacher model. This teacher-student framework allows the student model to predict action using only the original input video during inference. In our experiments, the proposed DL-KDD framework outperforms state-of-the-art methods on the ARID, ARID V1.5, and Dark-48 datasets. We achieve the best performance on each dataset and up to a 4.18% improvement on Dark-48, using only original video inputs, thus avoiding the use of two-stream framework or enhancement modules for inference. We further validate the effectiveness of the distillation strategy in ablative experiments. The results highlight the advantages of our knowledge distillation framework in dark human action recognition.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Enhancing Uncertain Demand Prediction in Hospitals Using Simple and Advanced Machine Learning
Authors:
Annie Hu,
Samuel Stockman,
Xun Wu,
Richard Wood,
Bangdong Zhi,
Oliver Y. Chén
Abstract:
Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends…
▽ More
Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends in advance. To address this issue, here, we develop two methods, a relatively simple time-vary linear model, and a more advanced neural network model. The former forecasts patient arrivals hourly over a week based on factors such as day of the week and previous 7-day arrival patterns. The latter leverages a long short-term memory (LSTM) model, capturing non-linear relationships between past data and a three-day forecasting window. We evaluate the predictive capabilities of the two proposed approaches compared to two naïve approaches - a reduced-rank vector autoregressive (VAR) model and the TBATS model. Using patient care demand data from Rambam Medical Center in Israel, our results show that both proposed models effectively capture hourly variations of patient demand. Additionally, the linear model is more explainable thanks to its simple architecture, whereas, by accurately modelling weekly seasonal trends, the LSTM model delivers lower prediction errors. Taken together, our explorations suggest the utility of machine learning in predicting time-varying patient care demand; additionally, it is possible to predict patient care demand with good accuracy (around 4 patients) three days or a week in advance using machine learning.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices
Authors:
Zhengang Li,
Geng Yuan,
Tomoharu Yamauchi,
Zabihi Masoud,
Yanyue Xie,
Peiyan Dong,
Xulong Tang,
Nobuyuki Yoshikawa,
Devesh Tiwari,
Yanzhi Wang,
Olivia Chen
Abstract:
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic `0' and `1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward develo** an AQFP-based BNN accelerator, several critical challenges rema…
▽ More
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic `0' and `1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward develo** an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
A Life-Cycle Energy and Inventory Analysis of Adiabatic Quantum-Flux-Parametron Circuits
Authors:
Masoud Zabihi,
Yanyue Xie,
Zhengang Li,
Peiyan Dong,
Geng Yuan,
Olivia Chen,
Massoud Pedram,
Yanzhi Wang
Abstract:
The production process of superconductive integrated circuits is complex and consumes significant amounts of resources and energy. Therefore, it is crucial to evaluate the environmental impact of this emerging technology. An attractive option for the next generation of superconductive technology is Adiabatic Quantum-Flux-Parametron (AQFP) devices. This study is the first to present a comprehensive…
▽ More
The production process of superconductive integrated circuits is complex and consumes significant amounts of resources and energy. Therefore, it is crucial to evaluate the environmental impact of this emerging technology. An attractive option for the next generation of superconductive technology is Adiabatic Quantum-Flux-Parametron (AQFP) devices. This study is the first to present a comprehensive process-based life-cycle assessment (LCA) and inventory analysis of AQFP integrated circuits. To generate relevant outcomes, we conduct a comparative LCA that included the bulk CMOS technology. The inventory analysis considered the manufacturing, assembly, and use phases of the circuits. To ensure a fair assessment, we choose the 32-bit AQFP RISC-V single-core processor as the reference functional unit and compare its performance with that of a CMOS counterpart. Our findings reveal that the AQFP processor consumes several orders of magnitude less energy during the use phase than its CMOS counterpart. Consequently, the total life cycle energy (which encompasses manufacturing and assembly energies) of AQFP integrated circuits improves at least by two orders of magnitude.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
Resilient conductive membrane synthesized by in-situ polymerisation for wearable non-invasive electronics on moving appendages of cyborg insect
Authors:
Qifeng Lin,
Rui Li,
Feilong Zhang,
Kai Kazuki,
Ong Zong Chen,
Xiaodong Chen,
Hirotaka Sato
Abstract:
By leveraging their high mobility and small size, insects have been combined with microcontrollers to build up cyborg insects for various practical applications. Unfortunately, all current cyborg insects rely on implanted electrodes to control their movement, which causes irreversible damage to their organs and muscles. Here, we develop a non-invasive method for cyborg insects to address above iss…
▽ More
By leveraging their high mobility and small size, insects have been combined with microcontrollers to build up cyborg insects for various practical applications. Unfortunately, all current cyborg insects rely on implanted electrodes to control their movement, which causes irreversible damage to their organs and muscles. Here, we develop a non-invasive method for cyborg insects to address above issues, using a conformal electrode with an in-situ polymerized ion-conducting layer and an electron-conducting layer. The neural and locomotion responses to the electrical inductions verify the efficient communication between insects and controllers by the non-invasive method. The precise "S" line following of the cyborg insect further demonstrates its potential in practical navigation. The conformal non-invasive electrodes keep the intactness of the insects used while controlling their motion. With the antennae, important olfactory organs of insects preserved, the cyborg insect, in the future, may be endowed with abilities to detect the surrounding environment.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
L-SeqSleepNet: Whole-cycle Long Sequence Modelling for Automatic Sleep Staging
Authors:
Huy Phan,
Kristian P. Lorenzen,
Elisabeth Heremans,
Oliver Y. Chén,
Minh C. Tran,
Philipp Koch,
Alfred Mertins,
Mathias Baumert,
Kaare Mikkelsen,
Maarten De Vos
Abstract:
Human sleep is cyclical with a period of approximately 90 minutes, implying long temporal dependency in the sleep data. Yet, exploring this long-term dependency when develo** sleep staging models has remained untouched. In this work, we show that while encoding the logic of a whole sleep cycle is crucial to improve sleep staging performance, the sequential modelling approach in existing state-of…
▽ More
Human sleep is cyclical with a period of approximately 90 minutes, implying long temporal dependency in the sleep data. Yet, exploring this long-term dependency when develo** sleep staging models has remained untouched. In this work, we show that while encoding the logic of a whole sleep cycle is crucial to improve sleep staging performance, the sequential modelling approach in existing state-of-the-art deep learning models are inefficient for that purpose. We thus introduce a method for efficient long sequence modelling and propose a new deep learning model, L-SeqSleepNet, which takes into account whole-cycle sleep information for sleep staging. Evaluating L-SeqSleepNet on four distinct databases of various sizes, we demonstrate state-of-the-art performance obtained by the model over three different EEG setups, including scalp EEG in conventional Polysomnography (PSG), in-ear EEG, and around-the-ear EEG (cEEGrid), even with a single EEG channel input. Our analyses also show that L-SeqSleepNet is able to alleviate the predominance of N2 sleep (the major class in terms of classification) to bring down errors in other sleep stages. Moreover the network becomes much more robust, meaning that for all subjects where the baseline method had exceptionally poor performance, their performance are improved significantly. Finally, the computation time only grows at a sub-linear rate when the sequence length increases.
△ Less
Submitted 4 August, 2023; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Personalized Longitudinal Assessment of Multiple Sclerosis Using Smartphones
Authors:
Oliver Y. Chén,
Florian Lipsmeier,
Huy Phan,
Frank Dondelinger,
Andrew Creagh,
Christian Gossens,
Michael Lindemann,
Maarten de Vos
Abstract:
Personalized longitudinal disease assessment is central to quickly diagnosing, appropriately managing, and optimally adapting the therapeutic strategy of multiple sclerosis (MS). It is also important for identifying the idiosyncratic subject-specific disease profiles. Here, we design a novel longitudinal model to map individual disease trajectories in an automated way using sensor data that may co…
▽ More
Personalized longitudinal disease assessment is central to quickly diagnosing, appropriately managing, and optimally adapting the therapeutic strategy of multiple sclerosis (MS). It is also important for identifying the idiosyncratic subject-specific disease profiles. Here, we design a novel longitudinal model to map individual disease trajectories in an automated way using sensor data that may contain missing values. First, we collect digital measurements related to gait and balance, and upper extremity functions using sensor-based assessments administered on a smartphone. Next, we treat missing data via imputation. We then discover potential markers of MS by employing a generalized estimation equation. Subsequently, parameters learned from multiple training datasets are ensembled to form a simple, unified longitudinal predictive model to forecast MS over time in previously unseen people with MS. To mitigate potential underestimation for individuals with severe disease scores, the final model incorporates additional subject-specific fine-tuning using data from the first day. The results show that the proposed model is promising to achieve personalized longitudinal MS assessment; they also suggest that features related to gait and balance as well as upper extremity function, remotely collected from sensor-based assessments, may be useful digital markers for predicting MS over time.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Uniting Machine Intelligence, Brain and Behavioural Sciences to Assist Criminal Justice
Authors:
Oliver Y. Chén
Abstract:
I discuss here three important roles where machine intelligence, brain and behaviour studies together may facilitate criminal law. First, predictive modelling using brain and behaviour data may support legal investigations by predicting categorical, continuous, and longitudinal legal outcomes of interests related to brain injury and mental illnesses. Second, psychological, psychiatric, and behavio…
▽ More
I discuss here three important roles where machine intelligence, brain and behaviour studies together may facilitate criminal law. First, predictive modelling using brain and behaviour data may support legal investigations by predicting categorical, continuous, and longitudinal legal outcomes of interests related to brain injury and mental illnesses. Second, psychological, psychiatric, and behavioural studies supported by machine learning algorithms may help predict human behaviour and actions, such as lies, biases, and visits to crime scenes. Third, machine learning models have been used to predict recidivism using clinical and criminal data whereas brain decoding is beginning to uncover one's thoughts and intentions based on brain imaging data. Having dispensed with achievements and promises, I examine concerns regarding the accuracy, reliability, and reproducibility of the brain- and behaviour-based assessments in criminal law, as well as questions regarding data possession, ethics, free will (and automatism), privacy, and security. Further, I will discuss issues related to predictability vs. explainability, population-level prediction vs. personalised prediction, and predicting future actions, and outline three potential scenarios where brain and behaviour data may be used as court evidence. Taken together, brain and behaviour decoding in legal exploration and decision-making at present is promising but primitive. The derived evidence is limited and should not be used to generate definitive conclusions, although it can be potentially used in addition, or parallel, to existing evidence. Finally, I suggest that there needs to be (more precise) definitions and regulations regarding when and when not brain and behaviour data can be used in a predictive manner in legal cases.
△ Less
Submitted 25 September, 2022; v1 submitted 30 June, 2022;
originally announced July 2022.
-
SleepTransformer: Automatic Sleep Staging with Interpretability and Uncertainty Quantification
Authors:
Huy Phan,
Kaare Mikkelsen,
Oliver Y. Chén,
Philipp Koch,
Alfred Mertins,
Maarten De Vos
Abstract:
Background: Black-box skepticism is one of the main hindrances impeding deep-learning-based automatic sleep scoring from being used in clinical environments. Methods: Towards interpretability, this work proposes a sequence-to-sequence sleep-staging model, namely SleepTransformer. It is based on the transformer backbone and offers interpretability of the model's decisions at both the epoch and sequ…
▽ More
Background: Black-box skepticism is one of the main hindrances impeding deep-learning-based automatic sleep scoring from being used in clinical environments. Methods: Towards interpretability, this work proposes a sequence-to-sequence sleep-staging model, namely SleepTransformer. It is based on the transformer backbone and offers interpretability of the model's decisions at both the epoch and sequence level. We further propose a simple yet efficient method to quantify uncertainty in the model's decisions. The method, which is based on entropy, can serve as a metric for deferring low-confidence epochs to a human expert for further inspection. Results: Making sense of the transformer's self-attention scores for interpretability, at the epoch level, the attention scores are encoded as a heat map to highlight sleep-relevant features captured from the input EEG signal. At the sequence level, the attention scores are visualized as the influence of different neighboring epochs in an input sequence (i.e. the context) to recognition of a target epoch, mimicking the way manual scoring is done by human experts. Conclusion: Additionally, we demonstrate that SleepTransformer performs on par with existing methods on two databases of different sizes. Significance: Equipped with interpretability and the ability of uncertainty quantification, SleepTransformer holds promise for being integrated into clinical settings.
△ Less
Submitted 26 January, 2022; v1 submitted 23 May, 2021;
originally announced May 2021.
-
Multi-view Audio and Music Classification
Authors:
Huy Phan,
Huy Le Nguyen,
Oliver Y. Chén,
Lam Pham,
Philipp Koch,
Ian McLoughlin,
Alfred Mertins
Abstract:
We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view emb…
▽ More
We propose in this work a multi-view learning approach for audio and music classification. Considering four typical low-level representations (i.e. different views) commonly used for audio and music recognition tasks, the proposed multi-view network consists of four subnetworks, each handling one input types. The learned embedding in the subnetworks are then concatenated to form the multi-view embedding for classification similar to a simple concatenation network. However, apart from the joint classification branch, the network also maintains four classification branches on the single-view embedding of the subnetworks. A novel method is then proposed to keep track of the learning behavior on the classification branches and adapt their weights to proportionally blend their gradients for network training. The weights are adapted in such a way that learning on a branch that is generalizing well will be encouraged whereas learning on a branch that is overfitting will be slowed down. Experiments on three different audio and music classification tasks show that the proposed multi-view network not only outperforms the single-view baselines but also is superior to the multi-view baselines based on concatenation and late fusion.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Self-Attention Generative Adversarial Network for Speech Enhancement
Authors:
Huy Phan,
Huy Le Nguyen,
Oliver Y. Chén,
Philipp Koch,
Ngoc Q. K. Duong,
Ian McLoughlin,
Alfred Mertins
Abstract:
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we…
▽ More
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead.
△ Less
Submitted 6 February, 2021; v1 submitted 18 October, 2020;
originally announced October 2020.
-
XSleepNet: Multi-View Sequential Model for Automatic Sleep Staging
Authors:
Huy Phan,
Oliver Y. Chén,
Minh C. Tran,
Philipp Koch,
Alfred Mertins,
Maarten De Vos
Abstract:
Automating sleep staging is vital to scale up sleep assessment and diagnosis to serve millions experiencing sleep deprivation and disorders and enable longitudinal sleep monitoring in home environments. Learning from raw polysomnography signals and their derived time-frequency image representations has been prevalent. However, learning from multi-view inputs (e.g., both the raw signals and the tim…
▽ More
Automating sleep staging is vital to scale up sleep assessment and diagnosis to serve millions experiencing sleep deprivation and disorders and enable longitudinal sleep monitoring in home environments. Learning from raw polysomnography signals and their derived time-frequency image representations has been prevalent. However, learning from multi-view inputs (e.g., both the raw signals and the time-frequency images) for sleep staging is difficult and not well understood. This work proposes a sequence-to-sequence sleep staging model, XSleepNet, that is capable of learning a joint representation from both raw signals and time-frequency images. Since different views may generalize or overfit at different rates, the proposed network is trained such that the learning pace on each view is adapted based on their generalization/overfitting behavior. In simple terms, the learning on a particular view is speeded up when it is generalizing well and slowed down when it is overfitting. View-specific generalization/overfitting measures are computed on-the-fly during the training course and used to derive weights to blend the gradients from different views. As a result, the network is able to retain the representation power of different views in the joint features which represent the underlying distribution better than those learned by each individual view alone. Furthermore, the XSleepNet architecture is principally designed to gain robustness to the amount of training data and to increase the complementarity between the input views. Experimental results on five databases of different sizes show that XSleepNet consistently outperforms the single-view baselines and the multi-view baseline with a simple fusion strategy. Finally, XSleepNet also outperforms prior sleep staging methods and improves previous state-of-the-art results on the experimental databases.
△ Less
Submitted 31 March, 2021; v1 submitted 8 July, 2020;
originally announced July 2020.
-
Direct observation of 3D atomic packing in monatomic amorphous materials
Authors:
Yakun Yuan,
Dennis S. Kim,
Jihan Zhou,
Dillan J. Chang,
Fan Zhu,
Yasutaka Nagaoka,
Yao Yang,
Minh Pham,
Stanley J. Osher,
Ou Chen,
Peter Ercius,
Andreas K. Schmid,
Jianwei Miao
Abstract:
Liquids and solids are two fundamental states of matter. However, due to the lack of direct experimental determination, our understanding of the 3D atomic structure of liquids and amorphous solids remained speculative. Here we advance atomic electron tomography to determine for the first time the 3D atomic positions in monatomic amorphous materials, including a Ta thin film and two Pd nanoparticle…
▽ More
Liquids and solids are two fundamental states of matter. However, due to the lack of direct experimental determination, our understanding of the 3D atomic structure of liquids and amorphous solids remained speculative. Here we advance atomic electron tomography to determine for the first time the 3D atomic positions in monatomic amorphous materials, including a Ta thin film and two Pd nanoparticles. We observe that pentagonal bipyramids are the most abundant atomic motifs in these amorphous materials. Instead of forming icosahedra, the majority of pentagonal bipyramids arrange into networks that extend to medium-range scale. Molecular dynamic simulations further reveal that pentagonal bipyramid networks are prevalent in monatomic amorphous liquids, which rapidly grow in size and form icosahedra during the quench from the liquid state to glass state. The experimental method and results are expected to advance the study of the amorphous-crystalline phase transition and glass transition at the single-atom level.
△ Less
Submitted 2 December, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Personalized Automatic Sleep Staging with Single-Night Data: a Pilot Study with KL-Divergence Regularization
Authors:
Huy Phan,
Kaare Mikkelsen,
Oliver Y. Chén,
Philipp Koch,
Alfred Mertins,
Preben Kidmose,
Maarten De Vos
Abstract:
Brain waves vary between people. An obvious way to improve automatic sleep staging for longitudinal sleep monitoring is personalization of algorithms based on individual characteristics extracted from the first night of data. As a single night is a very small amount of data to train a sleep staging model, we propose a Kullback-Leibler (KL) divergence regularized transfer learning approach to addre…
▽ More
Brain waves vary between people. An obvious way to improve automatic sleep staging for longitudinal sleep monitoring is personalization of algorithms based on individual characteristics extracted from the first night of data. As a single night is a very small amount of data to train a sleep staging model, we propose a Kullback-Leibler (KL) divergence regularized transfer learning approach to address this problem. We employ the pretrained SeqSleepNet (i.e. the subject independent model) as a starting point and finetune it with the single-night personalization data to derive the personalized model. This is done by adding the KL divergence between the output of the subject independent model and the output of the personalized model to the loss function during finetuning. In effect, KL-divergence regularization prevents the personalized model from overfitting to the single-night data and straying too far away from the subject independent model. Experimental results on the Sleep-EDF Expanded database with 75 subjects show that sleep staging personalization with a single-night data is possible with help of the proposed KL-divergence regularization. On average, we achieve a personalized sleep staging accuracy of 79.6%, a Cohen's kappa of 0.706, a macro F1-score of 73.0%, a sensitivity of 71.8%, and a specificity of 94.2%. We find both that the approach is robust against overfitting and that it improves the accuracy by 4.5 percentage points compared to non-personalization and 2.2 percentage points compared to personalization without regularization.
△ Less
Submitted 11 May, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
Statistical Quantile Learning for Large, Nonlinear, and Additive Latent Variable Models
Authors:
Julien Bodelet,
Guillaume Blanc,
Jiajun Shan,
Graciela Muniz Terrera,
Oliver Y. Chen
Abstract:
The studies of large-scale, high-dimensional data in fields such as genomics and neuroscience have injected new insights into science. Yet, despite advances, they are confronting several challenges, often simultaneously: lack of interpretability, nonlinearity, slow computation, inconsistency and uncertain convergence, and small sample sizes compared to high feature dimensions. Here, we propose a r…
▽ More
The studies of large-scale, high-dimensional data in fields such as genomics and neuroscience have injected new insights into science. Yet, despite advances, they are confronting several challenges, often simultaneously: lack of interpretability, nonlinearity, slow computation, inconsistency and uncertain convergence, and small sample sizes compared to high feature dimensions. Here, we propose a relatively simple, scalable, and consistent nonlinear dimension reduction method that can potentially address these issues in unsupervised settings. We call this method Statistical Quantile Learning (SQL) because, methodologically, it leverages on a quantile approximation of the latent variables together with standard nonparametric techniques (sieve or penalyzed methods). We show that estimating the model simplifies into a convex assignment matching problem; we derive its asymptotic properties; we show that the model is identifiable under few conditions. Compared to its linear competitors, SQL explains more variance, yields better separation and explanation, and delivers more accurate outcome prediction. Compared to its nonlinear competitors, SQL shows considerable advantage in interpretability, ease of use and computations in large-dimensional settings. Finally, we apply SQL to high-dimensional gene expression data (consisting of 20,263 genes from 801 subjects), where the proposed method identified latent factors predictive of five cancer types. The SQL package is available at https://github.com/jbodelet/SQL.
△ Less
Submitted 29 December, 2023; v1 submitted 29 March, 2020;
originally announced March 2020.
-
Thou Shalt Not Reject the P-value
Authors:
Oliver Y. Chén,
Raúl G. Saraiva,
Guy Nagels,
Huy Phan,
Tom Schwantje,
Hengyi Cao,
Jiangtao Gou,
Jenna M. Reinen,
Bin Xiong,
Bangdong Zhi,
Xiaojun Wang,
Maarten de Vos
Abstract:
Since its debut in the 18th century, the P-value has been an important part of hypothesis testing-based scientific discoveries. As the statistical engine accelerates, questions are beginning to be raised, asking to what extent scientific discoveries based on P-values are reliable and reproducible, and the voice calling for adjusting the significance level or banning the P-value has been increasing…
▽ More
Since its debut in the 18th century, the P-value has been an important part of hypothesis testing-based scientific discoveries. As the statistical engine accelerates, questions are beginning to be raised, asking to what extent scientific discoveries based on P-values are reliable and reproducible, and the voice calling for adjusting the significance level or banning the P-value has been increasingly heard. Inspired by these questions and discussions, here we enquire into the useful roles and misuses of the P-value in scientific studies. For common misuses and misinterpretations, we provide modest recommendations for practitioners. Additionally, we compare statistical significance with clinical relevance. In parallel, we review the Bayesian alternatives for seeking evidence. Finally, we discuss the promises and risks of using meta-analysis to pool P-values from multiple studies to aggregate evidence. Taken together, the P-value underpins a useful probabilistic decision-making system and provides evidence at a continuous scale. But its interpretation must be contextual, considering the scientific question, experimental design (including the model specification, sample size, and significance level), statistical power, effect size, and reproducibility.
△ Less
Submitted 28 July, 2022; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Improving GANs for Speech Enhancement
Authors:
Huy Phan,
Ian V. McLoughlin,
Lam Pham,
Oliver Y. Chén,
Philipp Koch,
Maarten De Vos,
Alfred Mertins
Abstract:
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement map**. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement map**, which gradually refines the noisy input sig…
▽ More
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement map**. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement map**, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common map** that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement map**s at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.
△ Less
Submitted 12 September, 2020; v1 submitted 15 January, 2020;
originally announced January 2020.
-
Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning
Authors:
Huy Phan,
Oliver Y. Chén,
Philipp Koch,
Zongqing Lu,
Ian McLoughlin,
Alfred Mertins,
Maarten De Vos
Abstract:
Background: Despite recent significant progress in the development of automatic sleep staging methods, building a good model still remains a big challenge for sleep studies with a small cohort due to the data-variability and data-inefficiency issues. This work presents a deep transfer learning approach to overcome these issues and enable transferring knowledge from a large dataset to a small cohor…
▽ More
Background: Despite recent significant progress in the development of automatic sleep staging methods, building a good model still remains a big challenge for sleep studies with a small cohort due to the data-variability and data-inefficiency issues. This work presents a deep transfer learning approach to overcome these issues and enable transferring knowledge from a large dataset to a small cohort for automatic sleep staging. Methods: We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two networks as the means for transfer learning. The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain (i.e. the small cohort) to complete knowledge transfer. We employ the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and study deep transfer learning on three different target domains: the Sleep Cassette subset and the Sleep Telemetry subset of the Sleep-EDF Expanded database, and the Surrey-cEEGrid database. The target domains are purposely adopted to cover different degrees of data mismatch to the source domains. Results: Our experimental results show significant performance improvement on automatic sleep staging on the target domains achieved with the proposed deep transfer learning approach. Conclusions: These results suggest the efficacy of the proposed approach in addressing the above-mentioned data-variability and data-inefficiency issues. Significance: As a consequence, it would enable one to improve the quality of automatic sleep staging models when the amount of data is relatively small. The source code and the pretrained models are available at http://github.com/pquochuy/sleep_transfer_learning.
△ Less
Submitted 27 August, 2020; v1 submitted 30 July, 2019;
originally announced July 2019.
-
A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology
Authors:
Ruizhe Cai,
Ao Ren,
Olivia Chen,
Ning Liu,
Caiwen Ding,
Xuehai Qian,
Jie Han,
Wenhui Luo,
Nobuyuki Yoshikawa,
Yanzhi Wang
Abstract:
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implem…
▽ More
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
Deep Transfer Learning for Single-Channel Automatic Sleep Staging with Channel Mismatch
Authors:
Huy Phan,
Oliver Y. Chén,
Philipp Koch,
Alfred Mertins,
Maarten De Vos
Abstract:
Many sleep studies suffer from the problem of insufficient data to fully utilize deep neural networks as different labs use different recordings set ups, leading to the need of training automated algorithms on rather small databases, whereas large annotated databases are around but cannot be directly included into these studies for data compensation due to channel mismatch. This work presents a de…
▽ More
Many sleep studies suffer from the problem of insufficient data to fully utilize deep neural networks as different labs use different recordings set ups, leading to the need of training automated algorithms on rather small databases, whereas large annotated databases are around but cannot be directly included into these studies for data compensation due to channel mismatch. This work presents a deep transfer learning approach to overcome the channel mismatch problem and transfer knowledge from a large dataset to a small cohort to study automatic sleep staging with single-channel input. We employ the state-of-the-art SeqSleepNet and train the network in the source domain, i.e. the large dataset. Afterwards, the pretrained network is finetuned in the target domain, i.e. the small cohort, to complete knowledge transfer. We study two transfer learning scenarios with slight and heavy channel mismatch between the source and target domains. We also investigate whether, and if so, how finetuning entirely or partially the pretrained network would affect the performance of sleep staging on the target domain. Using the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source domain and the Sleep-EDF Expanded database consisting of 20 subjects as the target domain in this study, our experimental results show significant performance improvement on sleep staging achieved with the proposed deep transfer learning approach. Furthermore, these results also reveal the essential of finetuning the feature-learning parts of the pretrained network to be able to bypass the channel mismatch problem.
△ Less
Submitted 18 June, 2019; v1 submitted 11 April, 2019;
originally announced April 2019.
-
Spatio-Temporal Attention Pooling for Audio Scene Classification
Authors:
Huy Phan,
Oliver Y. Chén,
Lam Pham,
Philipp Koch,
Maarten De Vos,
Ian McLoughlin,
Alfred Mertins
Abstract:
Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The…
▽ More
Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The bidirectional recurrent layers are then able to encode the temporal dynamics of the resulting convolutional features. Afterwards, a two-dimensional attention mask is formed via the outer product of the spatial and temporal attention vectors learned from two designated attention layers to weigh and pool the recurrent output into a final feature vector for classification. The network is trained with between-class examples generated from between-class data augmentation. Experiments demonstrate that the proposed method not only outperforms a strong convolutional neural network baseline but also sets new state-of-the-art performance on the LITIS Rouen dataset.
△ Less
Submitted 28 June, 2019; v1 submitted 6 April, 2019;
originally announced April 2019.
-
Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene?
Authors:
Huy Phan,
Oliver Y. Chén,
Philipp Koch,
Lam Pham,
Ian McLoughlin,
Alfred Mertins,
Maarten De Vos
Abstract:
Due to the variability in characteristics of audio scenes, some scenes can naturally be recognized earlier than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized given state-of-the-art models. Moreover, as model fusion with deep network ensemble is preva…
▽ More
Due to the variability in characteristics of audio scenes, some scenes can naturally be recognized earlier than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized given state-of-the-art models. Moreover, as model fusion with deep network ensemble is prevalent in audio scene classification, we further study whether, and if so, when model fusion is necessary for this task. To achieve these goals, we employ two single-network systems relying on a convolutional neural network and a recurrent neural network for classification as well as early fusion and late fusion of these networks. Experimental results on the LITIS-Rouen dataset show that some scenes can be reliably recognized with a few seconds while other scenes require significantly longer durations. In addition, model fusion is shown to be the most beneficial when the signal length is short.
△ Less
Submitted 8 May, 2019; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Unifying Isolated and Overlap** Audio Event Detection with Multi-Label Multi-Task Convolutional Recurrent Neural Networks
Authors:
Huy Phan,
Oliver Y. Chén,
Philipp Koch,
Lam Pham,
Ian McLoughlin,
Alfred Mertins,
Maarten De Vos
Abstract:
We propose a multi-label multi-task framework based on a convolutional recurrent neural network to unify detection of isolated and overlap** audio events. The framework leverages the power of convolutional recurrent neural network architectures; convolutional layers learn effective features over which higher recurrent layers perform sequential modelling. Furthermore, the output layer is designed…
▽ More
We propose a multi-label multi-task framework based on a convolutional recurrent neural network to unify detection of isolated and overlap** audio events. The framework leverages the power of convolutional recurrent neural network architectures; convolutional layers learn effective features over which higher recurrent layers perform sequential modelling. Furthermore, the output layer is designed to handle arbitrary degrees of event overlap. At each time step in the recurrent output sequence, an output triple is dedicated to each event category of interest to jointly model event occurrence and temporal boundaries. That is, the network jointly determines whether an event of this category occurs, and when it occurs, by estimating onset and offset positions at each recurrent time step. We then introduce three sequential losses for network training: multi-label classification loss, distance estimation loss, and confidence loss. We demonstrate good generalization on two datasets: ITC-Irst for isolated audio event detection, and TUT-SED-Synthetic-2016 for overlap** audio event detection.
△ Less
Submitted 18 February, 2019; v1 submitted 2 November, 2018;
originally announced November 2018.
-
SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging
Authors:
Huy Phan,
Fernando Andreotti,
Navin Cooray,
Oliver Y. Chén,
Maarten De Vos
Abstract:
Automatic sleep staging has been often treated as a simple classification problem that aims at determining the label of individual target polysomnography (PSG) epochs one at a time. In this work, we tackle the task as a sequence-to-sequence classification problem that receives a sequence of multiple epochs as input and classifies all of their labels at once. For this purpose, we propose a hierarch…
▽ More
Automatic sleep staging has been often treated as a simple classification problem that aims at determining the label of individual target polysomnography (PSG) epochs one at a time. In this work, we tackle the task as a sequence-to-sequence classification problem that receives a sequence of multiple epochs as input and classifies all of their labels at once. For this purpose, we propose a hierarchical recurrent neural network named SeqSleepNet. At the epoch processing level, the network consists of a filterbank layer tailored to learn frequency-domain filters for preprocessing and an attention-based recurrent layer designed for short-term sequential modelling. At the sequence processing level, a recurrent layer placed on top of the learned epoch-wise features for long-term modelling of sequential epochs. The classification is then carried out on the output vectors at every time step of the top recurrent layer to produce the sequence of output labels. Despite being hierarchical, we present a strategy to train the network in an end-to-end fashion. We show that the proposed network outperforms state-of-the-art approaches, achieving an overall accuracy, macro F1-score, and Cohen's kappa of 87.1%, 83.3%, and 0.815 on a publicly available dataset with 200 subjects.
△ Less
Submitted 1 February, 2019; v1 submitted 28 September, 2018;
originally announced September 2018.
-
Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification
Authors:
Huy Phan,
Fernando Andreotti,
Navin Cooray,
Oliver Y. Chén,
Maarten De Vos
Abstract:
Correctly identifying sleep stages is important in diagnosing and treating sleep disorders. This work proposes a joint classification-and-prediction framework based on CNNs for automatic sleep staging, and, subsequently, introduces a simple yet efficient CNN architecture to power the framework. Given a single input epoch, the novel framework jointly determines its label (classification) and its ne…
▽ More
Correctly identifying sleep stages is important in diagnosing and treating sleep disorders. This work proposes a joint classification-and-prediction framework based on CNNs for automatic sleep staging, and, subsequently, introduces a simple yet efficient CNN architecture to power the framework. Given a single input epoch, the novel framework jointly determines its label (classification) and its neighboring epochs' labels (prediction) in the contextual output. While the proposed framework is orthogonal to the widely adopted classification schemes, which take one or multiple epochs as contextual inputs and produce a single classification decision on the target epoch, we demonstrate its advantages in several ways. First, it leverages the dependency among consecutive sleep epochs while surpassing the problems experienced with the common classification schemes. Second, even with a single model, the framework has the capacity to produce multiple decisions, which are essential in obtaining a good performance as in ensemble-of-models methods, with very little induced computational overhead. Probabilistic aggregation techniques are then proposed to leverage the availability of multiple decisions. We conducted experiments on two public datasets: Sleep-EDF Expanded with 20 subjects, and Montreal Archive of Sleep Studies dataset with 200 subjects. The proposed framework yields an overall classification accuracy of 82.3% and 83.6%, respectively. We also show that the proposed framework not only is superior to the baselines based on the common classification schemes but also outperforms existing deep-learning approaches. To our knowledge, this is the first work going beyond the standard single-output classification to consider multitask neural networks for automatic sleep staging. This framework provides avenues for further studies of different neural-network architectures for automatic sleep staging.
△ Less
Submitted 1 February, 2019; v1 submitted 16 May, 2018;
originally announced May 2018.
-
High-dimensional Multivariate Mediation: with Application to Neuroimaging Data
Authors:
Oliver Y. Chén,
Ciprian M. Crainiceanu,
Elizabeth L. Ogburn,
Brian S. Caffo,
Tor D. Wager,
Martin A. Lindquist
Abstract:
Mediation analysis has become an important tool in the behavioral sciences for investigating the role of intermediate variables that lie in the path between a randomized treatment and an outcome variable. The influence of the intermediate variable on the outcome is often explored using structural equation models (SEMs), with model coefficients interpreted as possible effects. While there has been…
▽ More
Mediation analysis has become an important tool in the behavioral sciences for investigating the role of intermediate variables that lie in the path between a randomized treatment and an outcome variable. The influence of the intermediate variable on the outcome is often explored using structural equation models (SEMs), with model coefficients interpreted as possible effects. While there has been significant research on the topic in recent years, little work has been done on mediation analysis when the intermediate variable (mediator) is a high-dimensional vector. In this work we present a new method for exploratory mediation analysis in this setting called the directions of mediation (DMs). The first DM is defined as the linear combination of the elements of a high-dimensional vector of potential mediators that maximizes the likelihood of the SEM. The subsequent DMs are defined as linear combinations of the elements of the high-dimensional vector that are orthonormal to the previous DMs and maximize the likelihood of the SEM. We provide an estimation algorithm and establish the asymptotic properties of the obtained estimators. This method is well suited for cases when many potential mediators are measured. Examples of high-dimensional potential mediators are brain images composed of hundreds of thousands of voxels, genetic variation measured at millions of SNPs, or vectors of thousands of variables in large-scale epidemiological studies. We demonstrate the method using a functional magnetic resonance imaging (fMRI) study of thermal pain where we are interested in determining which brain locations mediate the relationship between the application of a thermal stimulus and self-reported pain.
△ Less
Submitted 4 September, 2016; v1 submitted 30 November, 2015;
originally announced November 2015.