Advanced Multimodal Deep Learning Architecture for Image-Text Matching
Authors:
**yin Wang,
Hai**g Zhang,
Yihao Zhong,
Yingbin Liang,
Rongwei Ji,
Yiru Cang
Abstract:
Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry.…
▽ More
Image-text matching is a key multimodal task that aims to model the semantic association between images and text as a matching relationship. With the advent of the multimedia information age, image, and text data show explosive growth, and how to accurately realize the efficient and accurate semantic correspondence between them has become the core issue of common concern in academia and industry. In this study, we delve into the limitations of current multimodal deep learning models in processing image-text pairing tasks. Therefore, we innovatively design an advanced multimodal deep learning architecture, which combines the high-level abstract representation ability of deep neural networks for visual information with the advantages of natural language processing models for text semantic understanding. By introducing a novel cross-modal attention mechanism and hierarchical feature fusion strategy, the model achieves deep fusion and two-way interaction between image and text feature space. In addition, we also optimize the training objectives and loss functions to ensure that the model can better map the potential association structure between images and text during the learning process. Experiments show that compared with existing image-text matching models, the optimized new model has significantly improved performance on a series of benchmark data sets. In addition, the new model also shows excellent generalization and robustness on large and diverse open scenario datasets and can maintain high matching performance even in the face of previously unseen complex situations.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
Optimal Resource Allocation for Multi-UAV Assisted Visible Light Communication
Authors:
Yihan Cang,
Ming Chen,
Zhaohui Yang,
Mingzhe Chen,
Chongwen Huang
Abstract:
In this paper, the optimization of deploying unmanned aerial vehicles (UAVs) over a reconfigurable intelligent surfaces (RISs)-assisted visible light communication (VLC) system is studied. In the considered model, UAVs are required to simultaneously provide wireless services as well as illumination for ground users. To meet the traffic and illumination demands of the ground users while minimizing…
▽ More
In this paper, the optimization of deploying unmanned aerial vehicles (UAVs) over a reconfigurable intelligent surfaces (RISs)-assisted visible light communication (VLC) system is studied. In the considered model, UAVs are required to simultaneously provide wireless services as well as illumination for ground users. To meet the traffic and illumination demands of the ground users while minimizing the energy consumption of the UAVs, one must optimize UAV deployment, phase shift of RISs, user association and RIS association. This problem is formulated as an optimization problem whose goal is to minimize the transmit power of UAVs via adjusting UAV deployment, phase shift of RISs, user association and RIS association. To solve this problem, the original optimization problem is divided into four subproblems and an alternating algorithm is proposed. Specifically, phases alignment method and semidefinite program (SDP) algorithm are proposed to optimize the phase shift of RISs. Then, the UAV deployment optimization is solved by the successive convex approximation (SCA) algorithm. Since the problems of user association and RIS association are integer programming, the fraction relaxation method is adopted before using dual method to find the optimal solution. For simplicity, a greedy algorithm is proposed as an alternative to optimize RIS association. The proposed two schemes demonstrate the superior performance of 34:85% and 32:11% energy consumption reduction over the case without RIS, respectively, through extensive numerical study.
△ Less
Submitted 24 December, 2020;
originally announced December 2020.