Search | arXiv e-print repository

Battery-Care Resource Allocation and Task Offloading in Multi-Agent Post-Disaster MEC Environment

Authors: Yiwei Tang, Hualong Huang, Wenhan Zhan, Geyong Min, Zhekai Duan, Yuchuan Lei

Abstract: Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaste… ▽ More Being an up-and-coming application scenario of mobile edge computing (MEC), the post-disaster rescue suffers multitudinous computing-intensive tasks but unstably guaranteed network connectivity. In rescue environments, quality of service (QoS), such as task execution delay, energy consumption and battery state of health (SoH), is of significant meaning. This paper studies a multi-user post-disaster MEC environment with unstable 5G communication, where device-to-device (D2D) link communication and dynamic voltage and frequency scaling (DVFS) are adopted to balance each user's requirement for task delay and energy consumption. A battery degradation evaluation approach to prolong battery lifetime is also presented. The distributed optimization problem is formulated into a mixed cooperative-competitive (MCC) multi-agent Markov decision process (MAMDP) and is tackled with recurrent multi-agent Proximal Policy Optimization (rMAPPO). Extensive simulations and comprehensive comparisons with other representative algorithms clearly demonstrate the effectiveness of the proposed rMAPPO-based offloading scheme. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: accepted by wcnc2024

arXiv:2312.01662 [pdf]

Universal Deoxidation of Semiconductor Substrates Assisted by Machine-Learning and Real-Time-Feedback-Control

Authors: Chao Shen, Wenkang Zhan, Jian Tang, Zhaofeng Wu, Bo Xu, Chao Zhao, Zhanguo Wang

Abstract: Thin film deposition is an essential step in the semiconductor process. During preparation or loading, the substrate is exposed to the air unavoidably, which has motivated studies of the process control to remove the surface oxide before thin film deposition. Optimizing the deoxidation process in molecular beam epitaxy (MBE) for a random substrate is a multidimensional challenge and sometimes cont… ▽ More Thin film deposition is an essential step in the semiconductor process. During preparation or loading, the substrate is exposed to the air unavoidably, which has motivated studies of the process control to remove the surface oxide before thin film deposition. Optimizing the deoxidation process in molecular beam epitaxy (MBE) for a random substrate is a multidimensional challenge and sometimes controversial. Due to variations in semiconductor materials and growth processes, the determination of substrate deoxidation temperature is highly dependent on the grower's expertise; the same substrate may yield inconsistent results when evaluated by different growers. Here, we employ a machine learning (ML) hybrid convolution and vision transformer (CNN-ViT) model. This model utilizes reflection high-energy electron diffraction (RHEED) video as input to determine the deoxidation status of the substrate as output, enabling automated substrate deoxidation under a controlled architecture. This also extends to the successful application of deoxidation processes on other substrates. Furthermore, we showcase the potential of models trained on data from a single MBE equipment to achieve high-accuracy deployment on other equipment. In contrast to traditional methods, our approach holds exceptional practical value. It standardizes deoxidation temperatures across various equipment and substrate materials, advancing the standardization research process in semiconductor preparation, a significant milestone in thin film growth technology. The concepts and methods demonstrated in this work are anticipated to revolutionize semiconductor manufacturing in optoelectronics and microelectronics industries by applying them to diverse material growth processes. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 5 figures

arXiv:2311.01993 [pdf, other]

Active Exploration in Iterative Gaussian Process Regression for Uncertainty Modeling in Autonomous Racing

Authors: Tommaso Benciolini, Chen Tang, Marion Leibold, Catherine Weaver, Masayoshi Tomizuka, Wei Zhan

Abstract: Autonomous racing creates challenging control problems, but Model Predictive Control (MPC) has made promising steps toward solving both the minimum lap-time problem and head-to-head racing. Yet, accurate models of the system are necessary for model-based control, including models of vehicle dynamics and opponent behavior. Both dynamics model error and opponent behavior can be modeled with Gaussian… ▽ More Autonomous racing creates challenging control problems, but Model Predictive Control (MPC) has made promising steps toward solving both the minimum lap-time problem and head-to-head racing. Yet, accurate models of the system are necessary for model-based control, including models of vehicle dynamics and opponent behavior. Both dynamics model error and opponent behavior can be modeled with Gaussian Process (GP) regression. GP models can be updated iteratively from data collected using the controller, but the strength of the GP model depends on the diversity of the training data. We propose a novel active exploration mechanism for iterative GP regression that purposefully collects additional data at regions of higher uncertainty in the GP model. In the exploration, a MPC collects diverse data by balancing the racing objectives and the exploration criterion; then the GP is re-trained. The process is repeated iteratively; in later iterations, the exploration is deactivated, and only the racing objectives are optimized. Thus, the MPC can achieve better performance by leveraging the improved GP model. We validate our approach in the highly realistic racing simulation platform Gran Turismo Sport of Sony Interactive Entertainment Inc for a minimum lap time challenge, and in numerical simulation of head-to-head. Our active exploration mechanism yields a significant improvement in the GP prediction accuracy compared to previous approaches and, thus, an improved racing performance. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2306.12898 [pdf]

Machine-Learning-Assisted and Real-Time-Feedback-Controlled Growth of InAs/GaAs Quantum Dots

Authors: Chao Shen, Wenkang Zhan, Kaiyao Xin, Manyang Li, Zhenyu Sun, Hui Cong, Chi Xu, Jian Tang, Zhaofeng Wu, Bo Xu, Zhongming Wei, Chunlai Xue, Chao Zhao, Zhanguo Wang

Abstract: Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for develo** various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam… ▽ More Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valuable for develo** various optoelectronic devices such as QD lasers and single photon sources. The applications strongly rely on the density and quality of these dots, which has motivated studies of the growth process control to realize high-quality epi-wafers and devices. Establishing the process parameters in molecular beam epitaxy (MBE) for a specific density of QDs is a multidimensional optimization challenge, usually addressed through time-consuming and iterative trial-and-error. Here, we report a real-time feedback control method to realize the growth of QDs with arbitrary density, which is fully automated and intelligent. We developed a machine learning (ML) model named 3D ResNet 50 trained using reflection high-energy electron diffraction (RHEED) videos as input instead of static images and providing real-time feedback on surface morphologies for process control. As a result, we demonstrated that ML from previous growth could predict the post-growth density of QDs, by successfully tuning the QD densities in near-real time from 1.5E10 cm-2 down to 3.8E8 cm-2 or up to 1.4E11 cm-2. Compared to traditional methods, our approach, with in situ tuning capabilities and excellent reliability, can dramatically expedite the material optimization process and improve the reproducibility of MBE, constituting significant progress for thin film growth techniques. The concepts and methodologies proved feasible in this work are promising to be applied to a variety of material growth processes, which will revolutionize semiconductor manufacturing for optoelectronic and microelectronic industries. △ Less

Submitted 11 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 5 figures

arXiv:2306.00265 [pdf, other]

Doubly Robust Self-Training

Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

Abstract: Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provabl… ▽ More Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provably balances between two extremes. When the pseudo-labels are entirely incorrect, our method reduces to a training process solely using labeled data. Conversely, when the pseudo-labels are completely accurate, our method transforms into a training process utilizing all pseudo-labeled data and labeled data, thus increasing the effective sample size. Through empirical evaluations on both the ImageNet dataset for image classification and the nuScenes autonomous driving dataset for 3D object detection, we demonstrate the superiority of the doubly robust loss over the standard self-training baseline. △ Less

Submitted 2 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.07740 [pdf, other]

Double-Iterative Gaussian Process Regression for Modeling Error Compensation in Autonomous Racing

Authors: Shaoshu Su, Ce Hao, Catherine Weaver, Chen Tang, Wei Zhan, Masayoshi Tomizuka

Abstract: Autonomous racing control is a challenging research problem as vehicles are pushed to their limits of handling to achieve an optimal lap time; therefore, vehicles exhibit highly nonlinear and complex dynamics. Difficult-to-model effects, such as drifting, aerodynamics, chassis weight transfer, and suspension can lead to infeasible and suboptimal trajectories. While offline planning allows optimizi… ▽ More Autonomous racing control is a challenging research problem as vehicles are pushed to their limits of handling to achieve an optimal lap time; therefore, vehicles exhibit highly nonlinear and complex dynamics. Difficult-to-model effects, such as drifting, aerodynamics, chassis weight transfer, and suspension can lead to infeasible and suboptimal trajectories. While offline planning allows optimizing a full reference trajectory for the minimum lap time objective, such modeling discrepancies are particularly detrimental when using offline planning, as planning model errors compound with controller modeling errors. Gaussian Process Regression (GPR) can compensate for modeling errors. However, previous works primarily focus on modeling error in real-time control without consideration for how the model used in offline planning can affect the overall performance. In this work, we propose a double-GPR error compensation algorithm to reduce model uncertainties; specifically, we compensate both the planner's model and controller's model with two respective GPR-based error compensation functions. Furthermore, we design an iterative framework to re-collect error-rich data using the racing control system. We test our method in the high-fidelity racing simulator Gran Turismo Sport (GTS); we find that our iterative, double-GPR compensation functions improve racing performance and iteration stability in comparison to a single compensation function applied merely for real-time control. △ Less

Submitted 26 June, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: 8 Pages, 6 Figures, Accepted by IFAC 2023 (The 22nd World Congress of the International Federation of Automatic Control)

arXiv:2211.09378 [pdf, other]

Outracing Human Racers with Model-based Planning and Control for Time-trial Racing

Authors: Ce Hao, Chen Tang, Eric Bergkvist, Catherine Weaver, Liting Sun, Wei Zhan, Masayoshi Tomizuka

Abstract: Autonomous racing has become a popular sub-topic of autonomous driving in recent years. The goal of autonomous racing research is to develop software to control the vehicle at its limit of handling and achieve human-level racing performance. In this work, we investigate how to approach human expert-level racing performance with model-based planning and control methods using the high-fidelity racin… ▽ More Autonomous racing has become a popular sub-topic of autonomous driving in recent years. The goal of autonomous racing research is to develop software to control the vehicle at its limit of handling and achieve human-level racing performance. In this work, we investigate how to approach human expert-level racing performance with model-based planning and control methods using the high-fidelity racing simulator Gran Turismo Sport (GTS). GTS enables a unique opportunity for autonomous racing research, as many recordings of racing from highly skilled human players can served as expert emonstrations. By comparing the performance of the autonomous racing software with human experts, we better understand the performance gap of existing software and explore new methodologies in a principled manner. In particular, we focus on the commonly adopted model-based racing framework, consisting of an offline trajectory planner and an online Model Predictive Control-based (MPC) tracking controller. We thoroughly investigate the design challenges from three perspective, namely vehicle model, planning algorithm, and controller design, and propose novel solutions to improve the baseline approach toward human expert-level performance. We showed that the proposed control framework can achieve top 0.95% lap time among human-expert players in GTS. Furthermore, we conducted comprehensive ablation studies to validate the necessity of proposed modules, and pointed out potential future directions to reach human-best performance. △ Less

Submitted 25 October, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: 16 pages, 13 figures, 3 tables

arXiv:2205.11790 [pdf, other]

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Authors: **ning Li, Chen Tang, Masayoshi Tomizuka, Wei Zhan

Abstract: Offline Reinforcement learning (RL) has shown potent in many safe-critical tasks in robotics where exploration is risky and expensive. However, it still struggles to acquire skills in temporally extended tasks. In this paper, we study the problem of offline RL for temporally extended tasks. We propose a hierarchical planning framework, consisting of a low-level goal-conditioned RL policy and a hig… ▽ More Offline Reinforcement learning (RL) has shown potent in many safe-critical tasks in robotics where exploration is risky and expensive. However, it still struggles to acquire skills in temporally extended tasks. In this paper, we study the problem of offline RL for temporally extended tasks. We propose a hierarchical planning framework, consisting of a low-level goal-conditioned RL policy and a high-level goal planner. The low-level policy is trained via offline RL. We improve the offline training to deal with out-of-distribution goals by a perturbed goal sampling process. The high-level planner selects intermediate sub-goals by taking advantages of model-based planning methods. It plans over future sub-goal sequences based on the learned value function of the low-level policy. We adopt a Conditional Variational Autoencoder to sample meaningful high-dimensional sub-goal candidates and to solve the high-level long-term strategy optimization problem. We evaluate our proposed method in long-horizon driving and robot navigation tasks. Experiments show that our method outperforms baselines with different hierarchical designs and other regular planners without hierarchy in these complex tasks. △ Less

Submitted 24 May, 2022; originally announced May 2022.

arXiv:2108.06533 [pdf, other]

Constrained Iterative LQG for Real-Time Chance-Constrained Gaussian Belief Space Planning

Authors: Jianyu Chen, Yutaka Shimizu, Liting Sun, Masayoshi Tomizuka, Wei Zhan

Abstract: Motion planning under uncertainty is of significant importance for safety-critical systems such as autonomous vehicles. Such systems have to satisfy necessary constraints (e.g., collision avoidance) with potential uncertainties coming from either disturbed system dynamics or noisy sensor measurements. However, existing motion planning methods cannot efficiently find the robust optimal solutions un… ▽ More Motion planning under uncertainty is of significant importance for safety-critical systems such as autonomous vehicles. Such systems have to satisfy necessary constraints (e.g., collision avoidance) with potential uncertainties coming from either disturbed system dynamics or noisy sensor measurements. However, existing motion planning methods cannot efficiently find the robust optimal solutions under general nonlinear and non-convex settings. In this paper, we formulate such problem as chance-constrained Gaussian belief space planning and propose the constrained iterative Linear Quadratic Gaussian (CILQG) algorithm as a real-time solution. In this algorithm, we iteratively calculate a Gaussian approximation of the belief and transform the chance-constraints. We evaluate the effectiveness of our method in simulations of autonomous driving planning tasks with static and dynamic obstacles. Results show that CILQG can handle uncertainties more appropriately and has faster computation time than baseline methods. △ Less

Submitted 21 August, 2021; v1 submitted 14 August, 2021; originally announced August 2021.

Comments: IROS 2021

arXiv:2103.00859 [pdf, other]

doi 10.1109/ACCESS.2021.3069336

Dynamic Underwater Acoustic Channel Tracking for Correlated Rapidly Time-varying Channels

Authors: Qihang Huang, Wei Li, Weicheng Zhan, Yuhang Wang, Rongrong Guo

Abstract: In this work, we focus on the model-mismatch problem for model-based subspace channel tracking in the correlated underwater acoustic channel. A model based on the underwater acoustic channel's correlation can be used as the state-space model in the Kalman filter to improve the underwater acoustic channel tracking compared that without a model. Even though the data support the assumption that the m… ▽ More In this work, we focus on the model-mismatch problem for model-based subspace channel tracking in the correlated underwater acoustic channel. A model based on the underwater acoustic channel's correlation can be used as the state-space model in the Kalman filter to improve the underwater acoustic channel tracking compared that without a model. Even though the data support the assumption that the model is slow-varying and uncorrelated to some degree, to improve the tracking performance further, we can not ignore the model-mismatch problem because most channel models encounter this problem in the underwater acoustic channel. Therefore, in this work, we provide a dynamic time-variant state-space model for underwater acoustic channel tracking. This model is tolerant to the slight correlation after decorrelation. Moreover, a forward-backward Kalman filter is combined to further improve the tracking performance. The performance of our proposed algorithm is demonstrated with the same at-sea data as that used for conventional channel tracking. Compared with the conventional algorithms, the proposed algorithm shows significant improvement, especially in rough sea conditions in which the channels are fast-varying. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Submitted to IEEE Access

Journal ref: IEEE Access 2021

arXiv:2101.06778 [pdf, other]

A Safe Hierarchical Planning Framework for Complex Driving Scenarios based on Reinforcement Learning

Authors: **ning Li, Liting Sun, Jianyu Chen, Masayoshi Tomizuka, Wei Zhan

Abstract: Autonomous vehicles need to handle various traffic conditions and make safe and efficient decisions and maneuvers. However, on the one hand, a single optimization/sampling-based motion planner cannot efficiently generate safe trajectories in real time, particularly when there are many interactive vehicles near by. On the other hand, end-to-end learning methods cannot assure the safety of the outco… ▽ More Autonomous vehicles need to handle various traffic conditions and make safe and efficient decisions and maneuvers. However, on the one hand, a single optimization/sampling-based motion planner cannot efficiently generate safe trajectories in real time, particularly when there are many interactive vehicles near by. On the other hand, end-to-end learning methods cannot assure the safety of the outcomes. To address this challenge, we propose a hierarchical behavior planning framework with a set of low-level safe controllers and a high-level reinforcement learning algorithm (H-CtRL) as a coordinator for the low-level controllers. Safety is guaranteed by the low-level optimization/sampling-based controllers, while the high-level reinforcement learning algorithm makes H-CtRL an adaptive and efficient behavior planner. To train and test our proposed algorithm, we built a simulator that can reproduce traffic scenes using real-world datasets. The proposed H-CtRL is proved to be effective in various realistic simulation scenarios, with satisfying performance in terms of both safety and efficiency. △ Less

Submitted 9 June, 2021; v1 submitted 17 January, 2021; originally announced January 2021.

arXiv:2101.05985 [pdf, other]

Interaction-Aware Behavior Planning for Autonomous Vehicles Validated with Real Traffic Data

Authors: **ning Li, Liting Sun, Wei Zhan, Masayoshi Tomizuka

Abstract: Autonomous vehicles (AVs) need to interact with other traffic participants who can be either cooperative or aggressive, attentive or inattentive. Such different characteristics can lead to quite different interactive behaviors. Hence, to achieve safe and efficient autonomous driving, AVs need to be aware of such uncertainties when they plan their own behaviors. In this paper, we formulate such a b… ▽ More Autonomous vehicles (AVs) need to interact with other traffic participants who can be either cooperative or aggressive, attentive or inattentive. Such different characteristics can lead to quite different interactive behaviors. Hence, to achieve safe and efficient autonomous driving, AVs need to be aware of such uncertainties when they plan their own behaviors. In this paper, we formulate such a behavior planning problem as a partially observable Markov Decision Process (POMDP) where the cooperativeness of other traffic participants is treated as an unobservable state. Under different cooperativeness levels, we learn the human behavior models from real traffic data via the principle of maximum likelihood. Based on that, the POMDP problem is solved by Monte-Carlo Tree Search. We verify the proposed algorithm in both simulations and real traffic data on a lane change scenario, and the results show that the proposed algorithm can successfully finish the lane changes without collisions. △ Less

Submitted 15 January, 2021; originally announced January 2021.

arXiv:1910.03088 [pdf, other]

INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps

Authors: Wei Zhan, Liting Sun, Di Wang, Haojie Shi, Aubrey Clausse, Maximilian Naumann, Julius Kummerle, Hendrik Konigshof, Christoph Stiller, Arnaud de La Fortelle, Masayoshi Tomizuka

Abstract: Behavior-related research areas such as motion prediction/planning, representation/imitation learning, behavior modeling/generation, and algorithm testing, require support from high-quality motion datasets containing interactive driving scenarios with different driving cultures. In this paper, we present an INTERnational, Adversarial and Cooperative moTION dataset (INTERACTION dataset) in interact… ▽ More Behavior-related research areas such as motion prediction/planning, representation/imitation learning, behavior modeling/generation, and algorithm testing, require support from high-quality motion datasets containing interactive driving scenarios with different driving cultures. In this paper, we present an INTERnational, Adversarial and Cooperative moTION dataset (INTERACTION dataset) in interactive driving scenarios with semantic maps. Five features of the dataset are highlighted. 1) The interactive driving scenarios are diverse, including urban/highway/ramp merging and lane changes, roundabouts with yield/stop signs, signalized intersections, intersections with one/two/all-way stops, etc. 2) Motion data from different countries and different continents are collected so that driving preferences and styles in different cultures are naturally included. 3) The driving behavior is highly interactive and complex with adversarial and cooperative motions of various traffic participants. Highly complex behavior such as negotiations, aggressive/irrational decisions and traffic rule violations are densely contained in the dataset, while regular behavior can also be found from cautious car-following, stop, left/right/U-turn to rational lane-change and cycling and pedestrian crossing, etc. 4) The levels of criticality span wide, from regular safe operations to dangerous, near-collision maneuvers. Real collision, although relatively slight, is also included. 5) Maps with complete semantic information are provided with physical layers, reference lines, lanelet connections and traffic rules. The data is recorded from drones and traffic cameras. Statistics of the dataset in terms of number of entities and interaction density are also provided, along with some utilization examples in a variety of behavior-related research areas. The dataset can be downloaded via https://interaction-dataset.com. △ Less

Submitted 30 September, 2019; originally announced October 2019.

arXiv:1907.08707 [pdf, other]

Interpretable Modelling of Driving Behaviors in Interactive Driving Scenarios based on Cumulative Prospect Theory

Authors: Liting Sun, Wei Zhan, Ye** Hu, Masayoshi Tomizuka

Abstract: Understanding human driving behavior is important for autonomous vehicles. In this paper, we propose an interpretable human behavior model in interactive driving scenarios based on the cumulative prospect theory (CPT). As a non-expected utility theory, CPT can well explain some systematically biased or ``irrational'' behavior/decisions of human that cannot be explained by the expected utility theo… ▽ More Understanding human driving behavior is important for autonomous vehicles. In this paper, we propose an interpretable human behavior model in interactive driving scenarios based on the cumulative prospect theory (CPT). As a non-expected utility theory, CPT can well explain some systematically biased or ``irrational'' behavior/decisions of human that cannot be explained by the expected utility theory. Hence, the goal of this work is to formulate the human drivers' behavior generation model with CPT so that some ``irrational'' behavior or decisions of human can be better captured and predicted. Towards such a goal, we first develop a CPT-driven decision-making model focusing on driving scenarios with two interacting agents. A hierarchical learning algorithm is proposed afterward to learn the utility function, the value function, and the decision weighting function in the CPT model. A case study for roundabout merging is also provided as verification. With real driving data, the prediction performances of three different models are compared: a predefined model based on time-to-collision (TTC), a learning-based model based on neural networks, and the proposed CPT-based model. The results show that the proposed model outperforms the TTC model and achieves similar performance as the learning-based model with much less training data and better interpretability. △ Less

Submitted 19 July, 2019; originally announced July 2019.

Comments: accepted to the 2019 IEEE Intelligent Transportation System Conference (ITSC2019)

arXiv:1707.02515 [pdf, other]

A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning

Authors: Liting Sun, Cheng Peng, Wei Zhan, Masayoshi Tomizuka

Abstract: For safe and efficient planning and control in autonomous driving, we need a driving policy which can achieve desirable driving quality in long-term horizon with guaranteed safety and feasibility. Optimization-based approaches, such as Model Predictive Control (MPC), can provide such optimal policies, but their computational complexity is generally unacceptable for real-time implementation. To add… ▽ More For safe and efficient planning and control in autonomous driving, we need a driving policy which can achieve desirable driving quality in long-term horizon with guaranteed safety and feasibility. Optimization-based approaches, such as Model Predictive Control (MPC), can provide such optimal policies, but their computational complexity is generally unacceptable for real-time implementation. To address this problem, we propose a fast integrated planning and control framework that combines learning- and optimization-based approaches in a two-layer hierarchical structure. The first layer, defined as the "policy layer", is established by a neural network which learns the long-term optimal driving policy generated by MPC. The second layer, called the "execution layer", is a short-term optimization-based controller that tracks the reference trajecotries given by the "policy layer" with guaranteed short-term safety and feasibility. Moreover, with efficient and highly-representative features, a small-size neural network is sufficient in the "policy layer" to handle many complicated driving scenarios. This renders online imitation learning with Dataset Aggregation (DAgger) so that the performance of the "policy layer" can be improved rapidly and continuously online. Several exampled driving scenarios are demonstrated to verify the effectiveness and efficiency of the proposed framework. △ Less

Submitted 8 July, 2017; originally announced July 2017.

Showing 1–15 of 15 results for author: Zhan, W