Search | arXiv e-print repository

On the Optimization and Generalization of Multi-head Attention

Authors: Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis

Abstract: The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attent… ▽ More The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 48 page; presented in the Workshop on High-dimensional Learning Dynamics, ICML 2023

arXiv:2205.02121 [pdf, other]

Accelerating phase-field-based simulation via machine learning

Authors: Iman Peivaste, Nima H. Siboni, Ghasem Alahyarizadeh, Reza Ghaderi, Bob Svendsen, Dierk Raabe, Jaber R. Mianroodi

Abstract: Phase-field-based models have become common in material science, mechanics, physics, biology, chemistry, and engineering for the simulation of microstructure evolution. Yet, they suffer from the drawback of being computationally very costly when applied to large, complex systems. To reduce such computational costs, a Unet-based artificial neural network is developed as a surrogate model in the cur… ▽ More Phase-field-based models have become common in material science, mechanics, physics, biology, chemistry, and engineering for the simulation of microstructure evolution. Yet, they suffer from the drawback of being computationally very costly when applied to large, complex systems. To reduce such computational costs, a Unet-based artificial neural network is developed as a surrogate model in the current work. Training input for this network is obtained from the results of the numerical solution of initial-boundary-value problems (IBVPs) based on the Fan-Chen model for grain microstructure evolution. In particular, about 250 different simulations with varying initial order parameters are carried out and 200 frames of the time evolution of the phase fields are stored for each simulation. The network is trained with 90% of this data, taking the $i$-th frame of a simulation, i.e. order parameter field, as input, and producing the $(i+1)$-th frame as the output. Evaluation of the network is carried out with a test dataset consisting of 2200 microstructures based on different configurations than originally used for training. The trained network is applied recursively on initial order parameters to calculate the time evolution of the phase fields. The results are compared to the ones obtained from the conventional numerical solution in terms of the errors in order parameters and the system's free energy. The resulting order parameter error averaged over all points and all simulation cases is 0.005 and the relative error in the total free energy in all simulation boxes does not exceed 1%. △ Less

Submitted 4 May, 2022; originally announced May 2022.

arXiv:1312.3990 [pdf]

doi 10.1109/ICCIS.2008.4670763

ECOC-Based Training of Neural Networks for Face Recognition

Authors: Nima Hatami, Reza Ebrahimpour, Reza Ghaderi

Abstract: Error Correcting Output Codes, ECOC, is an output representation method capable of discovering some of the errors produced in classification tasks. This paper describes the application of ECOC to the training of feed forward neural networks, FFNN, for improving the overall accuracy of classification systems. Indeed, to improve the generalization of FFNN classifiers, this paper proposes an ECOC-Bas… ▽ More Error Correcting Output Codes, ECOC, is an output representation method capable of discovering some of the errors produced in classification tasks. This paper describes the application of ECOC to the training of feed forward neural networks, FFNN, for improving the overall accuracy of classification systems. Indeed, to improve the generalization of FFNN classifiers, this paper proposes an ECOC-Based training method for Neural Networks that use ECOC as the output representation, and adopts the traditional Back-Propagation algorithm, BP, to adjust weights of the network. Experimental results for face recognition problem on Yale database demonstrate the effectiveness of our method. With a rejection scheme defined by a simple robustness rate, high reliability is achieved in this application. △ Less

Submitted 13 December, 2013; originally announced December 2013.

Journal ref: Cybernetics and Intelligent Systems, IEEE Conference on, 450-454, 2008

arXiv:1206.2027 [pdf]

Adaptive Fractional PID Controller for Robot Manipulator

Authors: H. Delavari, R. Ghaderi, N. A. Ranjbar, S. H. HosseinNia, S. Momani

Abstract: A Fractional adaptive PID (FPID) controller for a robot manipulator will be proposed. The PID parameters have been optimized by Genetic algorithm. The proposed controller is found robust by means of simulation in a tracking job. The validity of the proposed controller is shown by simulation of two-link robot manipulator. The result then is compared with integer type adaptive PID controller. It is… ▽ More A Fractional adaptive PID (FPID) controller for a robot manipulator will be proposed. The PID parameters have been optimized by Genetic algorithm. The proposed controller is found robust by means of simulation in a tracking job. The validity of the proposed controller is shown by simulation of two-link robot manipulator. The result then is compared with integer type adaptive PID controller. It is found that when error signals in the learning stage are bounded, the trajectory of the robot converges to the desired one asymptotically. △ Less

Submitted 10 June, 2012; originally announced June 2012.

Comments: Proceedings of FDA'10. The 4th IFAC Workshop Fractional Differentiation and its Applications. Article no. FDA10-038 Badajoz, Spain, October 18-20, 2010

Showing 1–4 of 4 results for author: Ghaderi, R