Search | arXiv e-print repository

Robust bilinear factor analysis based on the matrix-variate $t$ distribution

Authors: Xuan Ma, Jianhua Zhao, Changchun Shang, Fen Jiang, Philip L. H. Yu

Abstract: Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, a… ▽ More Factor Analysis based on multivariate $t$ distribution ($t$fa) is a useful robust tool for extracting common factors on heavy-tailed or contaminated data. However, $t$fa is only applicable to vector data. When $t$fa is applied to matrix data, it is common to first vectorize the matrix observations. This introduces two challenges for $t$fa: (i) the inherent matrix structure of the data is broken, and (ii) robustness may be lost, as vectorized matrix data typically results in a high data dimension, which could easily lead to the breakdown of $t$fa. To address these issues, starting from the intrinsic matrix structure of matrix data, a novel robust factor analysis model, namely bilinear factor analysis built on the matrix-variate $t$ distribution ($t$bfa), is proposed in this paper. The novelty is that it is capable to simultaneously extract common factors for both row and column variables of interest on heavy-tailed or contaminated matrix data. Two efficient algorithms for maximum likelihood estimation of $t$bfa are developed. Closed-form expression for the Fisher information matrix to calculate the accuracy of parameter estimates are derived. Empirical studies are conducted to understand the proposed $t$bfa model and compare with related competitors. The results demonstrate the superiority and practicality of $t$bfa. Importantly, $t$bfa exhibits a significantly higher breakdown point than $t$fa, making it more suitable for matrix data. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2204.09086 [pdf, other]

Choosing the number of factors in factor analysis with incomplete data via a hierarchical Bayesian information criterion

Authors: Jianhua Zhao, Changchun Shang, Shulan Li, Ling Xin, Philip L. H. Yu

Abstract: The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete… ▽ More The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size $N$, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the `complete' sample size $N$ is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only $N_i<N$ observations for variable $i$, which means that using the `complete' sample size $N$ implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel criterion called hierarchical BIC (HBIC) for factor analysis with incomplete data is proposed. The novelty is that it only uses the actual amounts of observed information, namely $N_i$'s, in the penalty term. Theoretically, it is shown that HBIC is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC, which means that HBIC shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC, BIC, and related criteria with various missing rates. The results show that HBIC and BIC perform similarly when the missing rate is small, but HBIC is more accurate when the missing rate is not small. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: 16 pages, 4 figures

MSC Class: 62H25 ACM Class: G.3; I.2.6

arXiv:2107.11645 [pdf]

Dual-Attention Enhanced BDense-UNet for Liver Lesion Segmentation

Authors: Wenming Cao, Philip L. H. Yu, Gilbert C. S. Lui, Keith W. H. Chiu, Ho-Ming Cheng, Yanwen Fang, Man-Fung Yuen, Wai-Kay Seto

Abstract: In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet. DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow. Bidirectional LSTM is responsible to explore the relationships between the encoded features and the up… ▽ More In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet. DenseUNet allows learning enough diverse features and enhancing the representative power of networks by regulating the information flow. Bidirectional LSTM is responsible to explore the relationships between the encoded features and the up-sampled features in the encoding and decoding paths. Meanwhile, we introduce attention gates (AG) into DenseUNet to diminish responses of unrelated background regions and magnify responses of salient regions progressively. Besides, the attention in bidirectional LSTM takes into account the contribution differences of the encoded features and the up-sampled features in segmentation improvement, which can in turn adjust proper weights for these two kinds of features. We conduct experiments on liver CT image data sets collected from multiple hospitals by comparing them with state-of-the-art segmentation models. Experimental results indicate that our proposed method DA-BDense-UNet has achieved comparative performance in terms of dice coefficient, which demonstrates its effectiveness. △ Less

Submitted 24 July, 2021; originally announced July 2021.

Comments: 9 pages, 3 figures

arXiv:2107.10602 [pdf, other]

CNN-based Realized Covariance Matrix Forecasting

Authors: Yanwen Fang, Philip L. H. Yu, Yaohua Tang

Abstract: It is well known that modeling and forecasting realized covariance matrices of asset returns play a crucial role in the field of finance. The availability of high frequency intraday data enables the modeling of the realized covariance matrices directly. However, most of the models available in the literature depend on strong structural assumptions and they often suffer from the curse of dimensiona… ▽ More It is well known that modeling and forecasting realized covariance matrices of asset returns play a crucial role in the field of finance. The availability of high frequency intraday data enables the modeling of the realized covariance matrices directly. However, most of the models available in the literature depend on strong structural assumptions and they often suffer from the curse of dimensionality. We propose an end-to-end trainable model built on the CNN and Convolutional LSTM (ConvLSTM) which does not require to make any distributional or structural assumption but could handle high-dimensional realized covariance matrices consistently. The proposed model focuses on local structures and spatiotemporal correlations. It learns a nonlinear map** that connect the historical realized covariance matrices to the future one. Our empirical studies on synthetic and real-world datasets demonstrate its excellent forecasting ability compared with several advanced volatility models. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 17 pages, 5 figures

arXiv:2107.07445 [pdf, other]

AutoBERT-Zero: Evolving BERT Backbone from Scratch

Authors: Jiahui Gao, Hang Xu, Han Shi, Xiaozhe Ren, Philip L. H. Yu, Xiaodan Liang, Xin Jiang, Zhenguo Li

Abstract: Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks. However, the conventional paradigm constructs the backbone by purely stacking the manually designed global self-attention layers, introducing inductive bias and thus leads to sub-optimal. In this work, we make the first attempt to… ▽ More Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks. However, the conventional paradigm constructs the backbone by purely stacking the manually designed global self-attention layers, introducing inductive bias and thus leads to sub-optimal. In this work, we make the first attempt to automatically discover novel pre-trained language model (PLM) backbone on a flexible search space containing the most fundamental operations from scratch. Specifically, we propose a well-designed search space which (i) contains primitive math operations in the intra-layer level to explore novel attention structures, and (ii) leverages convolution blocks to be the supplementary for attentions in the inter-layer level to better learn local dependency. To enhance the efficiency for finding promising architectures, we propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm, which optimizes both the search algorithm and evaluation of candidate models. Specifically, we propose Operation-Priority (OP) evolution strategy to facilitate model search via balancing exploration and exploitation. Furthermore, we design a Bi-branch Weight-Sharing (BIWS) training strategy for fast model evaluation. Extensive experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks, proving the architecture's transfer and scaling abilities. Remarkably, AutoBERT-Zero-base outperforms RoBERTa-base (using much more data) and BERT-large (with much larger model size) by 2.4 and 1.4 higher score on GLUE test set. △ Less

Submitted 7 February, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

Comments: Accepted by AAAI-2022

arXiv:2010.01288 [pdf, other]

UNISON: Unpaired Cross-lingual Image Captioning

Authors: Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu

Abstract: Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and… ▽ More Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (i) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the map** from the source to the target language in the scene graph encoding space and decode sentences in the target language, and (ii) a cross-modal unsupervised feature map**, which seeks to map the encoded scene graph features from image modality to language modality. We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach. △ Less

Submitted 7 February, 2022; v1 submitted 3 October, 2020; originally announced October 2020.

Comments: Accepted by AAAI-2022

arXiv:1906.08059 [pdf, other]

Automated Computer Evaluation of Acute Ischemic Stroke and Large Vessel Occlusion

Authors: Jia You, Philip L. H. Yu, Anderson C. O. Tsang, Eva L. H. Tsui, Pauline P. S. Woo, Gilberto K. K. Leung

Abstract: Large vessel occlusion (LVO) plays an important role in the diagnosis of acute ischemic stroke. Identifying LVO of patients in the early stage on admission would significantly lower the probabilities of suffering from severe effects due to stroke or even save their lives. In this paper, we utilized both structural and imaging data from all recorded acute ischemic stroke patients in Hong Kong. Tota… ▽ More Large vessel occlusion (LVO) plays an important role in the diagnosis of acute ischemic stroke. Identifying LVO of patients in the early stage on admission would significantly lower the probabilities of suffering from severe effects due to stroke or even save their lives. In this paper, we utilized both structural and imaging data from all recorded acute ischemic stroke patients in Hong Kong. Total 300 patients (200 training and 100 testing) are used in this study. We established three hierarchical models based on demographic data, clinical data and features obtained from computerized tomography (CT) scans. The first two stages of modeling are merely based on demographic and clinical data. Besides, the third model utilized extra CT imaging features obtained from deep learning model. The optimal cutoff is determined at the maximal Youden index based on 10-fold cross-validation. With both clinical and imaging features, the Level-3 model achieved the best performance on testing data. The sensitivity, specificity, Youden index, accuracy and area under the curve (AUC) are 0.930, 0.684, 0.614, 0.790 and 0.850 respectively. △ Less

Submitted 18 June, 2019; originally announced June 2019.

arXiv:1905.09049 [pdf]

Automated Segmentation for Hyperdense Middle Cerebral Artery Sign of Acute Ischemic Stroke on Non-Contrast CT Images

Authors: Jia You, Philip L. H. Yu, Anderson C. O. Tsang, Eva L. H. Tsui, Pauline P. S. Woo, Gilberto K. K. Leung

Abstract: The hyperdense middle cerebral artery (MCA) dot sign has been reported as an important factor in the diagnosis of acute ischemic stroke due to large vessel occlusion. Interpreting the initial CT brain scan in these patients requires high level of expertise, and has high inter-observer variability. An automated computerized interpretation of the urgent CT brain image, with an emphasis to pick up ea… ▽ More The hyperdense middle cerebral artery (MCA) dot sign has been reported as an important factor in the diagnosis of acute ischemic stroke due to large vessel occlusion. Interpreting the initial CT brain scan in these patients requires high level of expertise, and has high inter-observer variability. An automated computerized interpretation of the urgent CT brain image, with an emphasis to pick up early signs of ischemic stroke will facilitate early patient diagnosis, triage, and shorten the door-to-revascularization time for these group of patients. In this paper, we present an automated detection method of segmenting the MCA dot sign on non-contrast CT brain image scans based on powerful deep learning technique. △ Less

Submitted 22 May, 2019; originally announced May 2019.

arXiv:1606.01792 [pdf, other]

Neural Machine Translation with External Phrase Memory

Authors: Yaohua Tang, Fandong Meng, Zhengdong Lu, Hang Li, Philip L. H. Yu

Abstract: In this paper, we propose phraseNet, a neural machine translator with a phrase memory which stores phrase pairs in symbolic form, mined from corpus or specified by human experts. For any given source sentence, phraseNet scans the phrase memory to determine the candidate phrase pairs and integrates tagging information in the representation of source sentence accordingly. The decoder utilizes a mixt… ▽ More In this paper, we propose phraseNet, a neural machine translator with a phrase memory which stores phrase pairs in symbolic form, mined from corpus or specified by human experts. For any given source sentence, phraseNet scans the phrase memory to determine the candidate phrase pairs and integrates tagging information in the representation of source sentence accordingly. The decoder utilizes a mixture of word-generating component and phrase-generating component, with a specifically designed strategy to generate a sequence of multiple words all at once. The phraseNet not only approaches one step towards incorporating external knowledge into neural machine translation, but also makes an effort to extend the word-by-word generation mechanism of recurrent neural network. Our empirical study on Chinese-to-English translation shows that, with carefully-chosen phrase table in memory, phraseNet yields 3.45 BLEU improvement over the generic neural machine translator. △ Less

Submitted 6 June, 2016; originally announced June 2016.

Comments: 8 figures, 9 pages

Showing 1–9 of 9 results for author: Yu, P L H