Search | arXiv e-print repository

arXiv:2305.19268 [pdf, other]

Intriguing Properties of Quantization at Scale

Authors: Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker

Abstract: Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop… ▽ More Emergent properties have been widely adopted as a term to describe behavior not present in smaller models but observed in larger models. Recent work suggests that the trade-off incurred by quantization is also an emergent property, with sharp drops in performance in models over 6B parameters. In this work, we ask "are quantization cliffs in performance solely a factor of scale?" Against a backdrop of increased research focus on why certain emergent properties surface at scale, this work provides a useful counter-example. We posit that it is possible to optimize for a quantization friendly training recipe that suppresses large activation magnitude outliers. Here, we find that outlier dimensions are not an inherent product of scale, but rather sensitive to the optimization conditions present during pre-training. This both opens up directions for more efficient quantization, and poses the question of whether other emergent properties are inherent or can be altered and conditioned by optimization and architecture design choices. We successfully quantize models ranging in size from 410M to 52B with minimal degradation in performance. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 32 pages, 14 figures

arXiv:2303.02257 [pdf, other]

Visual Perception System for Autonomous Driving

Authors: Qi Zhang, Siyuan Gou, Wenbin Li

Abstract: The recent surge in interest in autonomous driving stems from its rapidly develo** capacity to enhance safety, efficiency, and convenience. A pivotal aspect of autonomous driving technology is its perceptual systems, where core algorithms have yielded more precise algorithms applicable to autonomous driving, including vision-based Simultaneous Localization and Map** (SLAMs), object detection,… ▽ More The recent surge in interest in autonomous driving stems from its rapidly develo** capacity to enhance safety, efficiency, and convenience. A pivotal aspect of autonomous driving technology is its perceptual systems, where core algorithms have yielded more precise algorithms applicable to autonomous driving, including vision-based Simultaneous Localization and Map** (SLAMs), object detection, and tracking algorithms. This work introduces a visual-based perception system for autonomous driving that integrates trajectory tracking and prediction of moving objects to prevent collisions, while addressing autonomous driving's localization and map** requirements. The system leverages motion cues from pedestrians to monitor and forecast their movements and simultaneously maps the environment. This integrated approach resolves camera localization and the tracking of other moving objects in the scene, subsequently generating a sparse map to facilitate vehicle navigation. The performance, efficiency, and resilience of this approach are substantiated through comprehensive evaluations of both simulated and real-world datasets. △ Less

Submitted 31 October, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2205.04892 [pdf, other]

GRU-TV: Time- and velocity-aware GRU for patient representation on multivariate clinical time-series data

Authors: Ningtao Liu, Ruoxi Gao, **g Yuan, Calire Park, Shuwei Xing, Shui** Gou

Abstract: Electronic health records (EHRs) are usually highly dimensional, heterogeneous, and multimodal. Besides, the random recording of clinical variables results in high missing rates and uneven time intervals between adjacent records in the multivariate clinical time-series data extracted from EHRs. Current works using clinical time-series data for patient representation regard the patients' physiologi… ▽ More Electronic health records (EHRs) are usually highly dimensional, heterogeneous, and multimodal. Besides, the random recording of clinical variables results in high missing rates and uneven time intervals between adjacent records in the multivariate clinical time-series data extracted from EHRs. Current works using clinical time-series data for patient representation regard the patients' physiological status as a discrete process described by sporadically collected records. However, changes in the patient's physiological condition are continuous and dynamic processes. The perception of time and velocity of change is crucial for patient representation learning. In this study, we propose a time- and velocity-aware gated recurrent unit model (GRU-TV) for patient representation learning of clinical multivariate time-series data in a time-continuous manner. The neural ordinary differential equations (ODEs) and velocity perception mechanism are applied to perceive the time interval between adjacent records and changing rate of the patient's physiological status, respectively. Our experiments on two real clinical EHR datasets (PhysioNet2012, MIMIC-III) establish that GRU-TV is a robust model on computer-aided diagnosis (CAD) tasks, especially on sequences with high-variance time intervals. △ Less

Submitted 12 October, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

arXiv:2203.14045 [pdf, other]

doi 10.1007/s11633-023-1417-9

Adaptively Enhancing Facial Expression Crucial Regions via Local Non-Local Joint Network

Authors: Guanghui Shi, Shasha Mao, Shui** Gou, Dandan Yan, Licheng Jiao, Lin Xiong

Abstract: Facial expression recognition (FER) is still one challenging research due to the small inter-class discrepancy in the facial expression data. In view of the significance of facial crucial regions for FER, many existing researches utilize the prior information from some annotated crucial points to improve the performance of FER. However, it is complicated and time-consuming to manually annotate fac… ▽ More Facial expression recognition (FER) is still one challenging research due to the small inter-class discrepancy in the facial expression data. In view of the significance of facial crucial regions for FER, many existing researches utilize the prior information from some annotated crucial points to improve the performance of FER. However, it is complicated and time-consuming to manually annotate facial crucial points, especially for vast wild expression images. Based on this, a local non-local joint network is proposed to adaptively light up the facial crucial regions in feature learning of FER in this paper. In the proposed method, two parts are constructed based on facial local and non-local information respectively, where an ensemble of multiple local networks are proposed to extract local features corresponding to multiple facial local regions and a non-local attention network is addressed to explore the significance of each local region. Especially, the attention weights obtained by the non-local network is fed into the local part to achieve the interactive feedback between the facial global and local information. Interestingly, the non-local weights corresponding to local regions are gradually updated and higher weights are given to more crucial regions. Moreover, U-Net is employed to extract the integrated features of deep semantic information and low hierarchical detail information of expression images. Finally, experimental results illustrate that the proposed method achieves more competitive performance compared with several state-of-the art methods on five benchmark datasets. Noticeably, the analyses of the non-local weights corresponding to local regions demonstrate that the proposed method can automatically enhance some crucial regions in the process of feature learning without any facial landmark information. △ Less

Submitted 27 February, 2024; v1 submitted 26 March, 2022; originally announced March 2022.

Report number: SN-2731-5398

Journal ref: Machine Intelligence Research, vol. 21, pp. 331-348, 2024

arXiv:2203.01429

SMTNet: Hierarchical cavitation intensity recognition based on sub-main transfer network

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: With the rapid development of smart manufacturing, data-driven machinery health management has been of growing attention. In situations where some classes are more difficult to be distinguished compared to others and where classes might be organised in a hierarchy of categories, current DL methods can not work well. In this study, a novel hierarchical cavitation intensity recognition framework usi… ▽ More With the rapid development of smart manufacturing, data-driven machinery health management has been of growing attention. In situations where some classes are more difficult to be distinguished compared to others and where classes might be organised in a hierarchy of categories, current DL methods can not work well. In this study, a novel hierarchical cavitation intensity recognition framework using Sub-Main Transfer Network, termed SMTNet, is proposed to classify acoustic signals of valve cavitation. SMTNet model outputs multiple predictions ordered from coarse to fine along a network corresponding to a hierarchy of target cavitation states. Firstly, a data augmentation method based on Sliding Window with Fast Fourier Transform (Swin-FFT) is developed to solve few-shot problem. Secondly, a 1-D double hierarchical residual block (1-D DHRB) is presented to capture sensitive features of the frequency domain valve acoustic signals. Thirdly, hierarchical multi-label tree is proposed to assist the embedding of the semantic structure of target cavitation states into SMTNet. Fourthly, experience filtering mechanism is proposed to fully learn a prior knowledge of cavitation detection model. Finally, SMTNet has been evaluated on two cavitation datasets without noise (Dataset 1 and Dataset 2), and one cavitation dataset with real noise (Dataset 3) provided by SAMSON AG (Frankfurt). The prediction accurcies of SMTNet for cavitation intensity recognition are as high as 95.32%, 97.16% and 100%, respectively. At the same time, the testing accuracies of SMTNet for cavitation detection are as high as 97.02%, 97.64% and 100%. In addition, SMTNet has also been tested for different frequencies of samples and has achieved excellent results of the highest frequency of samples of mobile phones. △ Less

Submitted 12 July, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: we need update this paper

arXiv:2203.01118 [pdf, other]

doi 10.1016/j.engappai.2022.104904

A multi-task learning for cavitation detection and cavitation intensity recognition of valve acoustic signals

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: With the rapid development of smart manufacturing, data-driven machinery health management has received a growing attention. As one of the most popular methods in machinery health management, deep learning (DL) has achieved remarkable successes. However, due to the issues of limited samples and poor separability of different cavitation states of acoustic signals, which greatly hinder the eventual… ▽ More With the rapid development of smart manufacturing, data-driven machinery health management has received a growing attention. As one of the most popular methods in machinery health management, deep learning (DL) has achieved remarkable successes. However, due to the issues of limited samples and poor separability of different cavitation states of acoustic signals, which greatly hinder the eventual performance of DL modes for cavitation intensity recognition and cavitation detection. In this work, a novel multi-task learning framework for simultaneous cavitation detection and cavitation intensity recognition framework using 1-D double hierarchical residual networks (1-D DHRN) is proposed for analyzing valves acoustic signals. Firstly, a data augmentation method based on sliding window with fast Fourier transform (Swin-FFT) is developed to alleviate the small-sample issue confronted in this study. Secondly, a 1-D double hierarchical residual block (1-D DHRB) is constructed to capture sensitive features from the frequency domain acoustic signals of valve. Then, a new structure of 1-D DHRN is proposed. Finally, the devised 1-D DHRN is evaluated on two datasets of valve acoustic signals without noise (Dataset 1 and Dataset 2) and one dataset of valve acoustic signals with realistic surrounding noise (Dataset 3) provided by SAMSON AG (Frankfurt). Our method has achieved state-of-the-art results. The prediction accurcies of 1-D DHRN for cavitation intensitys recognition are as high as 93.75%, 94.31% and 100%, which indicates that 1-D DHRN outperforms other DL models and conventional methods. At the same time, the testing accuracies of 1-D DHRN for cavitation detection are as high as 97.02%, 97.64% and 100%. In addition, 1-D DHRN has also been tested for different frequencies of samples and shows excellent results for frequency of samples that mobile phones can accommodate. △ Less

Submitted 20 April, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.13226

Journal ref: Engineering Applications of Artificial Intelligence, 113 (2022), 104904

arXiv:2202.13245 [pdf, other]

doi 10.1145/3534678.3539133

Regional-Local Adversarially Learned One-Class Classifier Anomalous Sound Detection in Global Long-Term Space

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: Anomalous sound detection (ASD) is one of the most significant tasks of mechanical equipment monitoring and maintaining in complex industrial systems. In practice, it is vital to precisely identify abnormal status of the working mechanical system, which can further facilitate the failure troubleshooting. In this paper, we propose a multi-pattern adversarial learning one-class classification framew… ▽ More Anomalous sound detection (ASD) is one of the most significant tasks of mechanical equipment monitoring and maintaining in complex industrial systems. In practice, it is vital to precisely identify abnormal status of the working mechanical system, which can further facilitate the failure troubleshooting. In this paper, we propose a multi-pattern adversarial learning one-class classification framework, which allows us to use both the generator and the discriminator of an adversarial model for efficient ASD. The core idea is learning to reconstruct the normal patterns of acoustic data through two different patterns of auto-encoding generators, which succeeds in extending the fundamental role of a discriminator from identifying real and fake data to distinguishing between regional and local pattern reconstructions. Furthermore, we present a global filter layer for long-term interactions in the frequency domain space, which directly learns from the original data without introducing any human priors. Extensive experiments performed on four real-world datasets from different industrial domains (three cavitation datasets provided by SAMSON AG, and one existing publicly) for anomaly detection show superior results, and outperform recent state-of-the-art ASD methods. △ Less

Submitted 26 February, 2022; originally announced February 2022.

Journal ref: KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2022

arXiv:2202.13226 [pdf, other]

doi 10.1016/j.measurement.2022.110897

An acoustic signal cavitation detection framework based on XGBoost with adaptive selection feature engineering

Authors: Yu Sha, Johannes Faber, Shui** Gou, Bo Liu, Wei Li, Stefan Schramm, Horst Stoecker, Thomas Steckenreiter, Domagoj Vnucec, Nadine Wetzstein, Andreas Widl, Kai Zhou

Abstract: Valves are widely used in industrial and domestic pipeline systems. However, during their operation, they may suffer from the occurrence of the cavitation, which can cause loud noise, vibration and damage to the internal components of the valve. Therefore, monitoring the flow status inside valves is significantly beneficial to prevent the additional cost induced by cavitation. In this paper, a nov… ▽ More Valves are widely used in industrial and domestic pipeline systems. However, during their operation, they may suffer from the occurrence of the cavitation, which can cause loud noise, vibration and damage to the internal components of the valve. Therefore, monitoring the flow status inside valves is significantly beneficial to prevent the additional cost induced by cavitation. In this paper, a novel acoustic signal cavitation detection framework--based on XGBoost with adaptive selection feature engineering--is proposed. Firstly, a data augmentation method with non-overlap** sliding window (NOSW) is developed to solve small-sample problem involved in this study. Then, the each segmented piece of time-domain acoustic signal is transformed by fast Fourier transform (FFT) and its statistical features are extracted to be the input to the adaptive selection feature engineering (ASFE) procedure, where the adaptive feature aggregation and feature crosses are performed. Finally, with the selected features the XGBoost algorithm is trained for cavitation detection and tested on valve acoustic signal data provided by Samson AG (Frankfurt). Our method has achieved state-of-the-art results. The prediction performance on the binary classification (cavitation and no-cavitation) and the four-class classification (cavitation choked flow, constant cavitation, incipient cavitation and no-cavitation) are satisfactory and outperform the traditional XGBoost by 4.67% and 11.11% increase of the accuracy. △ Less

Submitted 1 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

Journal ref: Measurement 192 (2022), 110897

arXiv:2107.11061 [pdf, other]

Label Distribution Amendment with Emotional Semantic Correlations for Facial Expression Recognition

Authors: Shasha Mao, Guanghui Shi, Licheng Jiao, Shui** Gou, Yangyang Li, Lin Xiong, Boxin Shi

Abstract: By utilizing label distribution learning, a probability distribution is assigned for a facial image to express a compound emotion, which effectively improves the problem of label uncertainties and noises occurred in one-hot labels. In practice, it is observed that correlations among emotions are inherently different, such as surprised and happy emotions are more possibly synchronized than surprise… ▽ More By utilizing label distribution learning, a probability distribution is assigned for a facial image to express a compound emotion, which effectively improves the problem of label uncertainties and noises occurred in one-hot labels. In practice, it is observed that correlations among emotions are inherently different, such as surprised and happy emotions are more possibly synchronized than surprised and neutral. It indicates the correlation may be crucial for obtaining a reliable label distribution. Based on this, we propose a new method that amends the label distribution of each facial image by leveraging correlations among expressions in the semantic space. Inspired by inherently diverse correlations among word2vecs, the topological information among facial expressions is firstly explored in the semantic space, and each image is embedded into the semantic space. Specially, a class-relation graph is constructed to transfer the semantic correlation among expressions into the task space. By comparing semantic and task class-relation graphs of each image, the confidence of its label distribution is evaluated. Based on the confidence, the label distribution is amended by enhancing samples with higher confidence and weakening samples with lower confidence. Experimental results demonstrate the proposed method is more effective than compared state-of-the-art methods. △ Less

Submitted 23 July, 2021; originally announced July 2021.

arXiv:2012.12002 [pdf, ps, other]

Trust in robot-mediated health information

Authors: David Cameron, Marina Sarda Gou, Laura Sbaffi

Abstract: This paper outlines a social robot platform for providing health information. In comparison with previous findings for accessing information online, the use of a social robot may affect which factors users consider important when evaluating the trustworthiness of health information provided. This paper outlines a social robot platform for providing health information. In comparison with previous findings for accessing information online, the use of a social robot may affect which factors users consider important when evaluating the trustworthiness of health information provided. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 2 pages

arXiv:2010.04116 [pdf, other]

Interlocking Backpropagation: Improving depthwise model-parallelism

Authors: Aidan N. Gomez, Oscar Key, Kuba Perlin, Stephen Gou, Nick Frosst, Jeff Dean, Yarin Gal

Abstract: The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wa… ▽ More The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel optimisation setting: local learning. Motivated by poor resource utilisation in the global setting and poor task performance in the local setting, we introduce a class of intermediary strategies between local and global learning referred to as interlocking backpropagation. These strategies preserve many of the compute-efficiency advantages of local optimisation, while recovering much of the task performance achieved by global optimisation. We assess our strategies on both image classification ResNets and Transformer language models, finding that our strategy consistently out-performs local learning in terms of task performance, and out-performs global learning in training efficiency. △ Less

Submitted 7 July, 2022; v1 submitted 8 October, 2020; originally announced October 2020.

arXiv:1903.09295 [pdf, other]

DQN with model-based exploration: efficient learning on environments with sparse rewards

Authors: Stephen Zhen Gou, Yuyang Liu

Abstract: We propose Deep Q-Networks (DQN) with model-based exploration, an algorithm combining both model-free and model-based approaches that explores better and learns environments with sparse rewards more efficiently. DQN is a general-purpose, model-free algorithm and has been proven to perform well in a variety of tasks including Atari 2600 games since it's first proposed by Minh et el. However, like m… ▽ More We propose Deep Q-Networks (DQN) with model-based exploration, an algorithm combining both model-free and model-based approaches that explores better and learns environments with sparse rewards more efficiently. DQN is a general-purpose, model-free algorithm and has been proven to perform well in a variety of tasks including Atari 2600 games since it's first proposed by Minh et el. However, like many other reinforcement learning (RL) algorithms, DQN suffers from poor sample efficiency when rewards are sparse in an environment. As a result, most of the transitions stored in the replay memory have no informative reward signal, and provide limited value to the convergence and training of the Q-Network. However, one insight is that these transitions can be used to learn the dynamics of the environment as a supervised learning problem. The transitions also provide information of the distribution of visited states. Our algorithm utilizes these two observations to perform a one-step planning during exploration to pick an action that leads to states least likely to be seen, thus improving the performance of exploration. We demonstrate our agent's performance in two classic environments with sparse rewards in OpenAI gym: Mountain Car and Lunar Lander. △ Less

Submitted 21 March, 2019; originally announced March 2019.

arXiv:1903.07243 [pdf]

doi 10.1109/JSTARS.2018.2879440

Complex Scene Classification of PolSAR Imagery based on a Self-paced Learning Approach

Authors: Wenshuai Chen, Shui** Gou, Xinlin Wang, Licheng Jiao, Changzhe Jiao, Alina Zare

Abstract: Existing polarimetric synthetic aperture radar (PolSAR) image classification methods cannot achieve satisfactory performance on complex scenes characterized by several types of land cover with significant levels of noise or similar scattering properties across land cover types. Hence, we propose a supervised classification method aimed at constructing a classifier based on self-paced learning (SPL… ▽ More Existing polarimetric synthetic aperture radar (PolSAR) image classification methods cannot achieve satisfactory performance on complex scenes characterized by several types of land cover with significant levels of noise or similar scattering properties across land cover types. Hence, we propose a supervised classification method aimed at constructing a classifier based on self-paced learning (SPL). SPL has been demonstrated to be effective at dealing with complex data while providing classifier. In this paper, a novel Support Vector Machine (SVM) algorithm based on SPL with neighborhood constraints (SVM_SPLNC) is proposed. The proposed method leverages the easiest samples first to obtain an initial parameter vector. Then, more complex samples are gradually incorporated to update the parameter vector iteratively. Moreover, neighborhood constraints are introduced during the training process to further improve performance. Experimental results on three real PolSAR images show that the proposed method performs well on complex scenes. △ Less

Submitted 17 March, 2019; originally announced March 2019.

Showing 1–13 of 13 results for author: Gou, S