Search | arXiv e-print repository

Hierarchical Space-Time Attention for Micro-Expression Recognition

Authors: Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

Abstract: Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within the… ▽ More Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within these relationships. To solve this issue, we propose the Hierarchical Space-Time Attention (HSTA). Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas. Then, we design Crossmodal Space-Time Attention (CSTA) to achieve a higher-quality fusion for crossmodal data. Finally, we hierarchically integrate USTA and CSTA to grasp the deeper facial cues. Our model emphasizes temporal modeling without neglecting the processing of special data, and it fuses the contents in different modalities while maintaining their respective uniqueness. Extensive experiments on the four benchmarks show the effectiveness of our proposed HSTA. Specifically, compared with the latest method on the CASME3 dataset, it achieves about 3% score improvement in seven-category classification. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures

arXiv:2307.10704 [pdf]

Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits

Authors: Sharyal Zafar, Raphaël Feraud, Anne Blavette, Guy Camilleri, Hamid Ben

Abstract: The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalab… ▽ More The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: CIRED 2023 International Conference & Exhibition on Electricity Distribution, Jun 2023, Rome, Italy

arXiv:2201.01984 [pdf, other]

Compact Bidirectional Transformer for Image Captioning

Authors: Yuanen Zhou, Zhenzhen Hu, Daqing Liu, Huixia Ben, Meng Wang

Abstract: Most current image captioning models typically generate captions from left to right. This unidirectional property makes them can only leverage past context but not future context. Though recent refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these mod… ▽ More Most current image captioning models typically generate captions from left to right. This unidirectional property makes them can only leverage past context but not future context. Though recent refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these models generally consists of two networks~(i.e. a retriever or captioner in the first stage and a refiner in the second stage), which can only be executed sequentially. In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly while the decoder can be executed parallelly. Specifically, it is implemented by tightly coupling left-to-right(L2R) and right-to-left(R2L) flows into a single compact model~(i.e. implicitly) and optionally allowing interaction of the two flows(i.e. explicitly), while the final caption is chosen from either L2R or R2L flow in a sentence-level ensemble manner. We conduct extensive ablation studies on the MSCOCO benchmark and find that the compact architecture, which serves as a regularization for implicitly exploiting bidirectional context, and the sentence-level ensemble play more important roles than the explicit interaction mechanism. By combining with word-level ensemble seamlessly, the effect of the sentence-level ensemble is further enlarged. We further extend the conventional one-flow self-critical training to the two-flows version under this architecture and achieve new state-of-the-art results in comparison with non-vision-language-pretraining models. Source code is available at {\color{magenta}\url{https://github.com/YuanEZhou/CBTrans}}. △ Less

Submitted 6 January, 2022; originally announced January 2022.

arXiv:2105.07258 [pdf, other]

doi 10.1016/j.rinam.2021.100185

Polynomial degree reduction in the $\mathcal{L}^2$-norm on a symmetric interval for the canonical basis

Authors: Habib Ben Abdallah, Christopher J. Henry, Sheela Ramanna

Abstract: In this paper, we develop a direct formula for determining the coefficients in the canonical basis of the best polynomial of degree $M$ that approximates a polynomial of degree $N>M$ on a symmetric interval for the $\mathcal{L}^2$-norm. We also formally prove that using the formula is more computationally efficient than using a classical matrix multiplication approach and we provide an example to… ▽ More In this paper, we develop a direct formula for determining the coefficients in the canonical basis of the best polynomial of degree $M$ that approximates a polynomial of degree $N>M$ on a symmetric interval for the $\mathcal{L}^2$-norm. We also formally prove that using the formula is more computationally efficient than using a classical matrix multiplication approach and we provide an example to illustrate that it is more numerically stable than the classical approach. △ Less

Submitted 15 May, 2021; originally announced May 2021.

arXiv:2009.04077 [pdf, other]

doi 10.1016/j.knosys.2022.108174

1-Dimensional polynomial neural networks for audio signal related problems

Authors: Habib Ben Abdallah, Christopher J. Henry, Sheela Ramanna

Abstract: In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always prefe… ▽ More In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always preferred to deeper ones as they offer the advantage of using less computational units and less parameters. This compacity comes at the price of reduced non-linearity and thus, of limited solution search space. We propose the 1-Dimensional Polynomial Neural Network (1DPNN) model that uses automatic polynomial kernel estimation for 1-Dimensional Convolutional Neural Networks (1DCNNs) and that introduces a high degree of non-linearity from the first layer which can compensate the need for deep and/or wide topologies. We show that this non-linearity enables the model to yield better results with less computational and spatial complexity than a regular 1DCNN on various classification and regression problems related to audio signals, even though it introduces more computational and spatial complexity on a neuronal level. The experiments were conducted on three publicly available datasets and demonstrate that, on the problems that were tackled, the proposed model can extract more relevant information from the data than a 1DCNN in less time and with less memory. △ Less

Submitted 12 January, 2022; v1 submitted 8 September, 2020; originally announced September 2020.

Showing 1–5 of 5 results for author: Ben, H