-
Hierarchical Space-Time Attention for Micro-Expression Recognition
Authors:
Haihong Hao,
Shuo Wang,
Huixia Ben,
Yanbin Hao,
Yansong Wang,
Weiwei Wang
Abstract:
Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within the…
▽ More
Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within these relationships. To solve this issue, we propose the Hierarchical Space-Time Attention (HSTA). Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas. Then, we design Crossmodal Space-Time Attention (CSTA) to achieve a higher-quality fusion for crossmodal data. Finally, we hierarchically integrate USTA and CSTA to grasp the deeper facial cues. Our model emphasizes temporal modeling without neglecting the processing of special data, and it fuses the contents in different modalities while maintaining their respective uniqueness. Extensive experiments on the four benchmarks show the effectiveness of our proposed HSTA. Specifically, compared with the latest method on the CASME3 dataset, it achieves about 3% score improvement in seven-category classification.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits
Authors:
Sharyal Zafar,
Raphaƫl Feraud,
Anne Blavette,
Guy Camilleri,
Hamid Ben
Abstract:
The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalab…
▽ More
The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Compact Bidirectional Transformer for Image Captioning
Authors:
Yuanen Zhou,
Zhenzhen Hu,
Daqing Liu,
Huixia Ben,
Meng Wang
Abstract:
Most current image captioning models typically generate captions from left to right. This unidirectional property makes them can only leverage past context but not future context. Though recent refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these mod…
▽ More
Most current image captioning models typically generate captions from left to right. This unidirectional property makes them can only leverage past context but not future context. Though recent refinement-based models can exploit both past and future context by generating a new caption in the second stage based on pre-retrieved or pre-generated captions in the first stage, the decoder of these models generally consists of two networks~(i.e. a retriever or captioner in the first stage and a refiner in the second stage), which can only be executed sequentially. In this paper, we introduce a Compact Bidirectional Transformer model for image captioning that can leverage bidirectional context implicitly and explicitly while the decoder can be executed parallelly. Specifically, it is implemented by tightly coupling left-to-right(L2R) and right-to-left(R2L) flows into a single compact model~(i.e. implicitly) and optionally allowing interaction of the two flows(i.e. explicitly), while the final caption is chosen from either L2R or R2L flow in a sentence-level ensemble manner. We conduct extensive ablation studies on the MSCOCO benchmark and find that the compact architecture, which serves as a regularization for implicitly exploiting bidirectional context, and the sentence-level ensemble play more important roles than the explicit interaction mechanism. By combining with word-level ensemble seamlessly, the effect of the sentence-level ensemble is further enlarged. We further extend the conventional one-flow self-critical training to the two-flows version under this architecture and achieve new state-of-the-art results in comparison with non-vision-language-pretraining models. Source code is available at {\color{magenta}\url{https://github.com/YuanEZhou/CBTrans}}.
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
Polynomial degree reduction in the $\mathcal{L}^2$-norm on a symmetric interval for the canonical basis
Authors:
Habib Ben Abdallah,
Christopher J. Henry,
Sheela Ramanna
Abstract:
In this paper, we develop a direct formula for determining the coefficients in the canonical basis of the best polynomial of degree $M$ that approximates a polynomial of degree $N>M$ on a symmetric interval for the $\mathcal{L}^2$-norm. We also formally prove that using the formula is more computationally efficient than using a classical matrix multiplication approach and we provide an example to…
▽ More
In this paper, we develop a direct formula for determining the coefficients in the canonical basis of the best polynomial of degree $M$ that approximates a polynomial of degree $N>M$ on a symmetric interval for the $\mathcal{L}^2$-norm. We also formally prove that using the formula is more computationally efficient than using a classical matrix multiplication approach and we provide an example to illustrate that it is more numerically stable than the classical approach.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
1-Dimensional polynomial neural networks for audio signal related problems
Authors:
Habib Ben Abdallah,
Christopher J. Henry,
Sheela Ramanna
Abstract:
In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always prefe…
▽ More
In addition to being extremely non-linear, modern problems require millions if not billions of parameters to solve or at least to get a good approximation of the solution, and neural networks are known to assimilate that complexity by deepening and widening their topology in order to increase the level of non-linearity needed for a better approximation. However, compact topologies are always preferred to deeper ones as they offer the advantage of using less computational units and less parameters. This compacity comes at the price of reduced non-linearity and thus, of limited solution search space. We propose the 1-Dimensional Polynomial Neural Network (1DPNN) model that uses automatic polynomial kernel estimation for 1-Dimensional Convolutional Neural Networks (1DCNNs) and that introduces a high degree of non-linearity from the first layer which can compensate the need for deep and/or wide topologies. We show that this non-linearity enables the model to yield better results with less computational and spatial complexity than a regular 1DCNN on various classification and regression problems related to audio signals, even though it introduces more computational and spatial complexity on a neuronal level. The experiments were conducted on three publicly available datasets and demonstrate that, on the problems that were tackled, the proposed model can extract more relevant information from the data than a 1DCNN in less time and with less memory.
△ Less
Submitted 12 January, 2022; v1 submitted 8 September, 2020;
originally announced September 2020.