Search | arXiv e-print repository

Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation

Authors: Hamid Taheri, Seyed Rasoul Hosseini

Abstract: Collision-free motion is essential for mobile robots. Most approaches to collision-free and efficient navigation with wheeled robots require parameter tuning by experts to obtain good navigation behavior. This study investigates the application of deep reinforcement learning to train a mobile robot for autonomous navigation in a complex environment. The robot utilizes LiDAR sensor data and a deep… ▽ More Collision-free motion is essential for mobile robots. Most approaches to collision-free and efficient navigation with wheeled robots require parameter tuning by experts to obtain good navigation behavior. This study investigates the application of deep reinforcement learning to train a mobile robot for autonomous navigation in a complex environment. The robot utilizes LiDAR sensor data and a deep neural network to generate control signals guiding it toward a specified target while avoiding obstacles. We employ two reinforcement learning algorithms in the Gazebo simulation environment: Deep Deterministic Policy Gradient and proximal policy optimization. The study introduces an enhanced neural network structure in the Proximal Policy Optimization algorithm to boost performance, accompanied by a well-designed reward function to improve algorithm efficacy. Experimental results conducted in both obstacle and obstacle-free environments underscore the effectiveness of the proposed approach. This research significantly contributes to the advancement of autonomous robotics in complex environments through the application of deep reinforcement learning. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2404.02348 [pdf, other]

COVID-19 Detection Based on Blood Test Parameters using Various Artificial Intelligence Methods

Authors: Kavian Khanjani, Seyed Rasoul Hosseini, Hamid Taheri, Shahrzad Shashaani, Mohammad Teshnehlab

Abstract: In 2019, the world faced a new challenge: a COVID-19 disease caused by the novel coronavirus, SARS-CoV-2. The virus rapidly spread across the globe, leading to a high rate of mortality, which prompted health organizations to take measures to control its transmission. Early disease detection is crucial in the treatment process, and computer-based automatic detection systems have been developed to a… ▽ More In 2019, the world faced a new challenge: a COVID-19 disease caused by the novel coronavirus, SARS-CoV-2. The virus rapidly spread across the globe, leading to a high rate of mortality, which prompted health organizations to take measures to control its transmission. Early disease detection is crucial in the treatment process, and computer-based automatic detection systems have been developed to aid in this effort. These systems often rely on artificial intelligence (AI) approaches such as machine learning, neural networks, fuzzy systems, and deep learning to classify diseases. This study aimed to differentiate COVID-19 patients from others using self-categorizing classifiers and employing various AI methods. This study used two datasets: the blood test samples and radiography images. The best results for the blood test samples obtained from San Raphael Hospital, which include two classes of individuals, those with COVID-19 and those with non-COVID diseases, were achieved through the use of the Ensemble method (a combination of a neural network and two machines learning methods). The results showed that this approach for COVID-19 diagnosis is cost-effective and provides results in a shorter amount of time than other methods. The proposed model achieved an accuracy of 94.09% on the dataset used. Secondly, the radiographic images were divided into four classes: normal, viral pneumonia, ground glass opacity, and COVID-19 infection. These were used for segmentation and classification. The lung lobes were extracted from the images and then categorized into specific classes. We achieved an accuracy of 91.1% on the image dataset. Generally, this study highlights the potential of AI in detecting and managing COVID-19 and underscores the importance of continued research and development in this field. △ Less

Submitted 28 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2310.12680 [pdf, other]

On the Optimization and Generalization of Multi-head Attention

Authors: Puneesh Deora, Rouzbeh Ghaderi, Hossein Taheri, Christos Thrampoulidis

Abstract: The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attent… ▽ More The training and generalization dynamics of the Transformer's core mechanism, namely the Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on single-head attention. Inspired by the demonstrated benefits of overparameterization when training fully-connected networks, we investigate the potential optimization and generalization advantages of using multiple attention heads. Towards this goal, we derive convergence and generalization guarantees for gradient-descent training of a single-layer multi-head self-attention model, under a suitable realizability condition on the data. We then establish primitive conditions on the initialization that ensure realizability holds. Finally, we demonstrate that these conditions are satisfied for a simple tokenized-mixture model. We expect the analysis can be extended to various data-model and architecture variations. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 48 page; presented in the Workshop on High-dimensional Learning Dynamics, ICML 2023

arXiv:2307.08994 [pdf, other]

Human Action Recognition in Still Images Using ConViT

Authors: Seyed Rohollah Hosseyni, Sanaz Seyedin, Hasan Taheri

Abstract: Understanding the relationship between different parts of an image is crucial in a variety of applications, including object recognition, scene understanding, and image classification. Despite the fact that Convolutional Neural Networks (CNNs) have demonstrated impressive results in classifying and detecting objects, they lack the capability to extract the relationship between different parts of a… ▽ More Understanding the relationship between different parts of an image is crucial in a variety of applications, including object recognition, scene understanding, and image classification. Despite the fact that Convolutional Neural Networks (CNNs) have demonstrated impressive results in classifying and detecting objects, they lack the capability to extract the relationship between different parts of an image, which is a crucial factor in Human Action Recognition (HAR). To address this problem, this paper proposes a new module that functions like a convolutional layer that uses Vision Transformer (ViT). In the proposed model, the Vision Transformer can complement a convolutional neural network in a variety of tasks by hel** it to effectively extract the relationship among various parts of an image. It is shown that the proposed model, compared to a simple CNN, can extract meaningful parts of an image and suppress the misleading parts. The proposed model has been evaluated on the Stanford40 and PASCAL VOC 2012 action datasets and has achieved 95.5% mean Average Precision (mAP) and 91.5% mAP results, respectively, which are promising compared to other state-of-the-art methods. △ Less

Submitted 11 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.13471 [pdf, other]

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

Authors: Hossein Taheri, Christos Thrampoulidis

Abstract: Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate… ▽ More Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum if the iterates find an interpolating model. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds. △ Less

Submitted 26 June, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2302.09235 [pdf, ps, other]

Generalization and Stability of Interpolating Neural Networks with Minimal Width

Authors: Hossein Taheri, Christos Thrampoulidis

Abstract: We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $ε$ and their distance from initialization is $g(ε)$, we demonstrate that gradient descent with $n$ training data achieves training error… ▽ More We investigate the generalization and optimization properties of shallow neural-network classifiers trained by gradient descent in the interpolating regime. Specifically, in a realizable scenario where model weights can achieve arbitrarily small training error $ε$ and their distance from initialization is $g(ε)$, we demonstrate that gradient descent with $n$ training data achieves training error $O(g(1/T)^2 /T)$ and generalization error $O(g(1/T)^2 /n)$ at iteration $T$, provided there are at least $m=Ω(g(1/T)^4)$ hidden neurons. We then show that our realizable setting encompasses a special case where data are separable by the model's neural tangent kernel. For this and logistic-loss minimization, we prove the training loss decays at a rate of $\tilde O(1/ T)$ given polylogarithmic number of neurons $m=Ω(\log^4 (T))$. Moreover, with $m=Ω(\log^{4} (n))$ neurons and $T\approx n$ iterations, we bound the test loss by $\tilde{O}(1/n)$. Our results differ from existing generalization outcomes using the algorithmic-stability framework, which necessitate polynomial width and yield suboptimal generalization rates. Central to our analysis is the use of a new self-bounded weak-convexity property, which leads to a generalized local quasi-convexity property for sufficiently parameterized neural-network classifiers. Eventually, despite the objective's non-convexity, this leads to convergence and generalization-gap bounds that resemble those found in the convex setting of linear logistic regression. △ Less

Submitted 27 March, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: With significant changes: Stating results without homogeneity assumption, Discussing results under NTK-separability in Section 4

arXiv:2209.07116 [pdf, other]

On Generalization of Decentralized Learning with Separable Data

Authors: Hossein Taheri, Christos Thrampoulidis

Abstract: Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentr… ▽ More Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds approximately match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance. △ Less

Submitted 27 March, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: Minor changes: fixing typos, few more references. Title changed to the title of conference version

arXiv:2102.12717 [pdf]

Cloud Broker: A Systematic Map** Study

Authors: Hoda Taheri, Faeze Ramezani, Neda Mohammadi, Parisa Khoshdel, Bahareh Taghavi, Neda Khorasani, Saeid Abrishami, Abbas Rasoolzadegan

Abstract: In a cloud environment, a cloud broker is an important entity that works as an independent middleware between cloud customers and providers to address issues and conduct negotiations related to satisfying both customer preferences and service provider profits. In recent years, researchers have published many articles which directly or indirectly address this research area. A systematic method is v… ▽ More In a cloud environment, a cloud broker is an important entity that works as an independent middleware between cloud customers and providers to address issues and conduct negotiations related to satisfying both customer preferences and service provider profits. In recent years, researchers have published many articles which directly or indirectly address this research area. A systematic method is vital for extracting all search spaces (journals, conferences, and workshops) and primary studies (articles) conducted in the cloud broker field and then selecting some of the highest quality studies. The proposed systematic review includes a comprehensive three-tier search strategy (manual search, backward snowballing, and database search). The detailed explanation of the reviewing process is inserted in Appendix A. In the search methodology, qualitative criteria have been defined to select studies with the highest quality and the most relevance among all search spaces. In the present study, out of 1,928 extracted search spaces, 171 search spaces have been selected based on the defined quality criteria. Then, 1,298 articles have been extracted from these 171 selected search spaces. As a result, 496 high-quality papers have been selected among the mentioned papers. The chosen papers were published in prestigious journals, conferences, and workshops from 2009 through 2019. In the current Systematic Map** Study (SMS), eight research questions have been designed for the purpose of identifying information that is significant to the cloud broker field, such as the most critical and debated topics, existing trends and issues, active researchers and countries, commonly used techniques in building cloud brokers, evaluation methods, the amount of research conducted by year and the place of publication, and the most important active search spaces. △ Less

Submitted 1 January, 2023; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: 19 pages, 11 Tables, 16 figures

arXiv:2010.13275 [pdf, other]

Asymptotic Behavior of Adversarial Training in Binary Classification

Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Abstract: It has been consistently reported that many machine learning models are susceptible to adversarial attacks i.e., small additive adversarial perturbations applied to data points can cause misclassification. Adversarial training using empirical risk minimization is considered to be the state-of-the-art method for defense against adversarial attacks. Despite being successful in practice, several prob… ▽ More It has been consistently reported that many machine learning models are susceptible to adversarial attacks i.e., small additive adversarial perturbations applied to data points can cause misclassification. Adversarial training using empirical risk minimization is considered to be the state-of-the-art method for defense against adversarial attacks. Despite being successful in practice, several problems in understanding generalization performance of adversarial training remain open. In this paper, we derive precise theoretical predictions for the performance of adversarial training in binary classification. We consider the high-dimensional regime where the dimension of data grows with the size of the training data-set at a constant ratio. Our results provide exact asymptotics for standard and adversarial test errors of the estimators obtained by adversarial training with $\ell_q$-norm bounded perturbations ($q \ge 1$) for both discriminative binary models and generative Gaussian-mixture models with correlated features. Furthermore, we use these sharp predictions to uncover several intriguing observations on the role of various parameters including the over-parameterization ratio, the data model, and the attack budget on the adversarial and standard errors. △ Less

Submitted 13 July, 2021; v1 submitted 25 October, 2020; originally announced October 2020.

Comments: V3: additional theoretical results, extensions to correlated features

arXiv:2006.08917 [pdf, other]

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Abstract: Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characteri… ▽ More Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in high-dimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the sub-optimality of popular heuristic choices: for instance, we show that optimally-tuned least-squares is (perhaps surprisingly) approximately optimal for standard logistic data, but the sub-optimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridge-regularization as a function of the over-parameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics. △ Less

Submitted 5 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

arXiv:2002.09964 [pdf, other]

Quantized Decentralized Stochastic Learning over Directed Graphs

Authors: Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

Abstract: We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decen… ▽ More We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decentralized stochastic learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization. More importantly, we prove that our algorithm achieves the same convergence rates of the decentralized stochastic learning algorithm with exact-communication for both convex and non-convex losses. Numerical evaluations corroborate our main theoretical results and illustrate significant speed-up compared to the exact-communication methods. △ Less

Submitted 28 December, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

arXiv:2002.07284 [pdf, other]

Sharp Asymptotics and Optimal Performance for Inference in Binary Models

Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Abstract: We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance amon… ▽ More We study convex empirical risk minimization for high-dimensional inference in binary models. Our first result sharply predicts the statistical performance of such estimators in the linear asymptotic regime under isotropic Gaussian features. Importantly, the predictions hold for a wide class of convex loss functions, which we exploit in order to prove a bound on the best achievable performance among them. Notably, we show that the proposed bound is tight for popular binary models (such as Signed, Logistic or Probit), by constructing appropriate loss functions that achieve it. More interestingly, for binary linear classification under the Logistic and Probit models, we prove that the performance of least-squares is no worse than 0.997 and 0.98 times the optimal one. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions. △ Less

Submitted 26 February, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

arXiv:1908.07568 [pdf]

Power-Efficient Resource Allocation in Massive MIMO Aided Cloud RANs

Authors: Nahid Amani, Saeedeh Parsaeefard, Hassan Taheri, Hossein Pedram

Abstract: This paper considers the power-efficient resource allocation problem in a cloud radio access network (C-RAN). The C-RAN architecture consists of a set of base-band units (BBUs) which are connected to a set of radio remote heads (RRHs) equipped with massive multiple input multiple output (MIMO), via fronthaul links with limited capacity. We formulate the power-efficient optimization problem in C-RA… ▽ More This paper considers the power-efficient resource allocation problem in a cloud radio access network (C-RAN). The C-RAN architecture consists of a set of base-band units (BBUs) which are connected to a set of radio remote heads (RRHs) equipped with massive multiple input multiple output (MIMO), via fronthaul links with limited capacity. We formulate the power-efficient optimization problem in C-RANs as a joint resource allocation problem in order to jointly allocate the RRH and transmit power to each user, and fronthaul links and BBUs assign to active RRHs while satisfying the minimum required rate of each user. To solve this non-convex optimization problem we suggest iterative algorithm with two-step based on the complementary geometric programming (CGP) and the successive convex approximation (SCA). The simulation results indicate that our proposed scheme can significantly reduce the total transmission power by switching off the under-utilized RRHs. △ Less

Submitted 20 August, 2019; originally announced August 2019.

arXiv:1908.04433 [pdf, other]

Sharp Guarantees for Solving Random Equations with One-Bit Information

Authors: Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Abstract: We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries IID Gaussian. This includes, as a special case, the previously studied least-squares estimator… ▽ More We study the performance of a wide class of convex optimization-based estimators for recovering a signal from corrupted one-bit measurements in high-dimensions. Our general result predicts sharply the performance of such estimators in the linear asymptotic regime when the measurement vectors have entries IID Gaussian. This includes, as a special case, the previously studied least-squares estimator and various novel results for other popular estimators such as least-absolute deviations, hinge-loss and logistic-loss. Importantly, we exploit the fact that our analysis holds for generic convex loss functions to prove a bound on the best achievable performance across the entire class of estimators. Numerical simulations corroborate our theoretical findings and suggest they are accurate even for relatively small problem dimensions. △ Less

Submitted 23 January, 2020; v1 submitted 12 August, 2019; originally announced August 2019.

arXiv:1907.10595 [pdf, other]

Robust and Communication-Efficient Collaborative Learning

Authors: Amirhossein Reisizadeh, Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

Abstract: We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm… ▽ More We consider a decentralized learning problem, where a set of computing nodes aim at solving a non-convex optimization problem collaboratively. It is well-known that decentralized optimization schemes face two major system bottlenecks: stragglers' delay and communication overhead. In this paper, we tackle these bottlenecks by proposing a novel decentralized and gradient-based optimization algorithm named as QuanTimed-DSGD. Our algorithm stands on two main ideas: (i) we impose a deadline on the local gradient computations of each node at each iteration of the algorithm, and (ii) the nodes exchange quantized versions of their local models. The first idea robustifies to straggling nodes and the second alleviates communication efficiency. The key technical contribution of our work is to prove that with non-vanishing noises for quantization and stochastic gradients, the proposed method exactly converges to the global optimal for convex loss functions, and finds a first-order stationary point in non-convex scenarios. Our numerical evaluations of the QuanTimed-DSGD on training benchmark datasets, MNIST and CIFAR-10, demonstrate speedups of up to 3x in run-time, compared to state-of-the-art decentralized optimization methods. △ Less

Submitted 31 October, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

Showing 1–15 of 15 results for author: Taheri, H