Search | arXiv e-print repository

arXiv:2406.19680 [pdf, other]

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Authors: Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou

Abstract: In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a… ▽ More In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed MimicMotion, which can generate high-quality videos of arbitrary length mimicking specific motion guidance. Compared with previous methods, our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which significantly reduces image distortion. Lastly, for generating long and smooth videos, we propose a progressive latent fusion strategy. By this means, we can produce videos of arbitrary length with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in various aspects. Detailed results and comparisons are available on our project page: https://tencent.github.io/MimicMotion . △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2401.02339 [pdf]

How Do Pedestrians' Perception Change toward Autonomous Vehicles during Unmarked Midblock Multilane Crossings: Role of AV Operation and Signal Indication

Authors: Fengjiao Zou, Jennifer Harper Ogle, Patrick Gerard, Weimin **

Abstract: One of the primary impediments hindering the widespread acceptance of autonomous vehicles (AVs) among pedestrians is their limited comprehension of AVs. This study employs virtual reality (VR) to provide pedestrians with an immersive environment for engaging with and comprehending AVs during unmarked midblock multilane crossings. Diverse AV driving behaviors were modeled to exhibit negotiation beh… ▽ More One of the primary impediments hindering the widespread acceptance of autonomous vehicles (AVs) among pedestrians is their limited comprehension of AVs. This study employs virtual reality (VR) to provide pedestrians with an immersive environment for engaging with and comprehending AVs during unmarked midblock multilane crossings. Diverse AV driving behaviors were modeled to exhibit negotiation behavior with a yellow signal indication or non-yielding behavior with a blue signal indication. This paper aims to investigate the impact of various factors, such as AV behavior and signaling, pedestrian past behavior, etc., on pedestrians' perception change of AVs. Before and after the VR experiment, participants completed surveys assessing their perception of AVs, focusing on two main aspects: "Attitude" and "System Effectiveness." The Wilcoxon signed-rank test results demonstrated that both pedestrians' overall attitude score toward AVs and trust in the effectiveness of AV systems significantly increased following the VR experiment. Notably, individuals who exhibited a greater trust in the yellow signals were more inclined to display a higher attitude score toward AVs and to augment their trust in the effectiveness of AV systems. This indicates that the design of the yellow signal instills pedestrians with greater confidence in their interactions with AVs. Further, pedestrians who exhibit more aggressive crossing behavior are less likely to change their perception towards AVs as compared to those pedestrians with more positive crossing behaviors. It is concluded that integrating this paper's devised AV behavior and signaling within an immersive VR setting facilitated pedestrian engagement with AVs, thereby changing their perception of AVs. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2306.04618 [pdf, other]

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

Authors: Lifan Yuan, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Fangyuan Zou, Xingyi Cheng, Heng Ji, Zhiyuan Liu, Maosong Sun

Abstract: This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we i… ▽ More This paper reexamines the research on out-of-distribution (OOD) robustness in the field of NLP. We find that the distribution shift settings in previous studies commonly lack adequate challenges, hindering the accurate evaluation of OOD robustness. To address these issues, we propose a benchmark construction protocol that ensures clear differentiation and challenging distribution shifts. Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets. Based on BOSS, we conduct a series of experiments on pre-trained language models for analysis and evaluation of OOD robustness. First, for vanilla fine-tuning, we examine the relationship between in-distribution (ID) and OOD performance. We identify three typical types that unveil the inner learning mechanism, which could potentially facilitate the forecasting of OOD robustness, correlating with the advancements on ID datasets. Then, we evaluate 5 classic methods on BOSS and find that, despite exhibiting some effectiveness in specific cases, they do not offer significant improvement compared to vanilla fine-tuning. Further, we evaluate 5 LLMs with various adaptation paradigms and find that when sufficient ID data is available, fine-tuning domain-specific models outperform LLMs on ID examples significantly. However, in the case of OOD instances, prioritizing LLMs with in-context learning yields better results. We identify that both fine-tuned small models and LLMs face challenges in effectively addressing downstream tasks. The code is public at \url{https://github.com/lifan-yuan/OOD_NLP}. △ Less

Submitted 26 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted to NeurIPS 2023 Dataset and Benchmark Track. Code is available at \url{https://github.com/lifan-yuan/OOD_NLP}

arXiv:2303.17717 [pdf]

Pedestrian Behavior Interacting with Autonomous Vehicles during Unmarked Midblock Multilane Crossings: Role of Infrastructure Design, AV Operations and Signaling

Authors: Fengjiao Zou, Jennifer Ogle, Weimin **, Patrick Gerard, Daniel Petty, Andrew Robb

Abstract: One of the main challenges autonomous vehicles (AVs) will face is interacting with pedestrians, especially at unmarked midblock locations where the right-of-way is unspecified. This study investigates pedestrian crossing behavior given different roadway centerline features (i.e., undivided, two-way left-turn lane (TWLTL), and median) and various AV operational schemes portrayed to pedestrians thro… ▽ More One of the main challenges autonomous vehicles (AVs) will face is interacting with pedestrians, especially at unmarked midblock locations where the right-of-way is unspecified. This study investigates pedestrian crossing behavior given different roadway centerline features (i.e., undivided, two-way left-turn lane (TWLTL), and median) and various AV operational schemes portrayed to pedestrians through on-vehicle signals (i.e., no signal, yellow negotiating indication, and yellow/blue negotiating/no-yield indications). This study employs virtual reality to simulate an urban unmarked midblock environment where pedestrians interact with AVs. Results demonstrate that both roadway centerline design features and AV operations and signaling significantly impact pedestrian unmarked midblock crossing behavior, including the waiting time at the curb, waiting time in the middle of the road, and the total crossing time. But only the roadway centerline features significantly impact the walking time. Participants in the undivided scene spent a longer time waiting at the curb and walking on the road than in the median and TWLTL scenes, but they spent a shorter time waiting in the middle. Compared to the AV without a signal, the design of yellow signal significantly reduced pedestrian waiting time at the curb and in the middle. But yellow/blue significantly increased the pedestrian waiting time. Interaction effects between roadway centerline design features and AV operations and signaling are significant only for waiting time in the middle. For middle waiting time, yellow/blue signals had the most impact on the median road type and the least on the undivided road. Demographics, past behaviors, and walking exposure are also explored. Older individuals tend to wait longer, and pedestrian past crossing behaviors and past walking exposures do not significantly impact pedestrian walking behavior. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.15352 [pdf]

Pedestrian Behavior Interacting with Autonomous Vehicles: Role of AV Operation and Signal Indication and Roadway Infrastructure

Authors: Fengjiao Zou, Jennifer Ogle, Weimin **, Patrick Gerard, Daniel Petty, Andrew Robb

Abstract: Interacting with pedestrians is challenging for Autonomous vehicles (AVs). This study evaluates how AV operations /associated signaling and roadway infrastructure affect pedestrian behavior in virtual reality. AVs were designed with different operations and signal indications, including negotiating with no signal, negotiating with a yellow signal, and yellow/blue negotiating/no-yield indications.… ▽ More Interacting with pedestrians is challenging for Autonomous vehicles (AVs). This study evaluates how AV operations /associated signaling and roadway infrastructure affect pedestrian behavior in virtual reality. AVs were designed with different operations and signal indications, including negotiating with no signal, negotiating with a yellow signal, and yellow/blue negotiating/no-yield indications. Results show that AV signal significantly impacts pedestrians' accepted gap, walking time, and waiting time. Pedestrians chose the largest open gap between cars with AV showing no signal, and had the slowest crossing speed with AV showing a yellow signal indication. Roadway infrastructure affects pedestrian walking time and waiting time. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.13032 [pdf]

V2V-based Collision-avoidance Decision Strategy for Autonomous Vehicles Interacting with Fully Occluded Pedestrians at Midblock on Multilane Roadways

Authors: Fengjiao Zou, Hsien-Wen Deng, Tsing-Un Iunn, Jennifer Harper Ogle, Weimin **

Abstract: Pedestrian occlusion is challenging for autonomous vehicles (AVs) at midblock locations on multilane roadways because an AV cannot detect crossing pedestrians that are fully occluded by downstream vehicles in adjacent lanes. This paper tests the capability of vehicle-to-vehicle (V2V) communication between an AV and its downstream vehicles to share midblock pedestrian crossings information. The res… ▽ More Pedestrian occlusion is challenging for autonomous vehicles (AVs) at midblock locations on multilane roadways because an AV cannot detect crossing pedestrians that are fully occluded by downstream vehicles in adjacent lanes. This paper tests the capability of vehicle-to-vehicle (V2V) communication between an AV and its downstream vehicles to share midblock pedestrian crossings information. The researchers developed a V2V-based collision-avoidance decision strategy and compared it to a base scenario (i.e., decision strategy without the utilization of V2V). Simulation results showed that for the base scenario, the near-zero time-to-collision (TTC) indicated no time for the AV to take appropriate action and resulted in dramatic braking followed by collisions. But the V2V-based collision-avoidance decision strategy allowed for a proportional braking approach to increase the TTC allowing the pedestrian to cross safely. To conclude, the V2V-based collision-avoidance decision strategy has higher safety benefits for an AV interacting with fully occluded pedestrians at midblock locations on multilane roadways. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2212.07080 [pdf]

Reform and Practice of Computer Application Technology Major Construction and Development in Higher Vocational Colleges in China -- Taking Jiangxi Vocational College of Applied Technology as An Example

Authors: Yufei Xie, Yue Liu, Fan Zou

Abstract: This study takes the development path of computer application technology specialty construction in Higher Vocational Colleges under the background of high-level higher vocational schools and specialty construction plan with Chinese characteristics (double high plan) as the main research object, and puts forward the core concept of computer application technology specialty construction and developm… ▽ More This study takes the development path of computer application technology specialty construction in Higher Vocational Colleges under the background of high-level higher vocational schools and specialty construction plan with Chinese characteristics (double high plan) as the main research object, and puts forward the core concept of computer application technology specialty construction and development in Higher Vocational Colleges in China through the practice of computer application technology specialty construction and development reform in recent years The main measures and construction objectives provide specific experience and solutions for deepening the reform of computer application technology specialty in higher vocational colleges. △ Less

Submitted 14 December, 2022; originally announced December 2022.

ACM Class: K.3.2

Journal ref: International Journal of Higher Education Teaching Theory, Vol.1, No.4, 2020,243-245

arXiv:2106.14501 [pdf, other]

doi 10.1016/j.jvcir.2022.103712

R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network

Authors: Jiang Hai, Zhu Xuan, Songchen Han, Ren Yang, Yutong Hao, Fengzhu Zou, Fang Lin

Abstract: Images captured in weak illumination conditions could seriously degrade the image quality. Solving a series of degradation of low-light images can effectively improve the visual quality of images and the performance of high-level visual tasks. In this study, a novel Retinex-based Real-low to Real-normal Network (R2RNet) is proposed for low-light image enhancement, which includes three subnets: a D… ▽ More Images captured in weak illumination conditions could seriously degrade the image quality. Solving a series of degradation of low-light images can effectively improve the visual quality of images and the performance of high-level visual tasks. In this study, a novel Retinex-based Real-low to Real-normal Network (R2RNet) is proposed for low-light image enhancement, which includes three subnets: a Decom-Net, a Denoise-Net, and a Relight-Net. These three subnets are used for decomposing, denoising, contrast enhancement and detail preservation, respectively. Our R2RNet not only uses the spatial information of the image to improve the contrast but also uses the frequency information to preserve the details. Therefore, our model acheived more robust results for all degraded images. Unlike most previous methods that were trained on synthetic images, we collected the first Large-Scale Real-World paired low/normal-light images dataset (LSRW dataset) to satisfy the training requirements and make our model have better generalization performance in real-world scenes. Extensive experiments on publicly available datasets demonstrated that our method outperforms the existing state-of-the-art methods both quantitatively and visually. In addition, our results showed that the performance of the high-level visual task (i.e. face detection) can be effectively improved by using the enhanced results obtained by our method in low-light conditions. Our codes and the LSRW dataset are available at: https://github.com/abcdef2000/R2RNet. △ Less

Submitted 11 November, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 12 pages, 9 figures

Journal ref: Journal of Visual Communication and Image Representation, 2022

arXiv:2101.05471 [pdf, other]

Towards Practical Adam: Non-Convexity, Convergence Theory, and Mini-Batch Acceleration

Authors: Congliang Chen, Li Shen, Fangyu Zou, Wei Liu

Abstract: Adam is one of the most influential adaptive stochastic algorithms for training deep neural networks, which has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, \textit{etc.}, ha… ▽ More Adam is one of the most influential adaptive stochastic algorithms for training deep neural networks, which has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, \textit{etc.}, have been tried to promote Adam-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam for solving large-scale non-convex stochastic optimization. This observation, coupled with this sufficient condition, gives much deeper interpretations on the divergence of Adam. On the other hand, in practice, mini-Adam and distributed-Adam are widely used without any theoretical guarantee. We further give an analysis on how the batch size or the number of nodes in the distributed system affects the convergence of Adam, which theoretically shows that mini-batch and distributed Adam can be linearly accelerated by using a larger mini-batch size or a larger number of nodes.At last, we apply the generic Adam and mini-batch Adam with the sufficient condition for solving the counterexample and training several neural networks on various real-world datasets. Experimental results are exactly in accord with our theoretical analysis. △ Less

Submitted 8 August, 2022; v1 submitted 14 January, 2021; originally announced January 2021.

Comments: Accepted to JMLR(JMLR). arXiv admin note: substantial text overlap with arXiv:1811.09358

arXiv:2005.13117 [pdf, other]

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

Authors: Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Fei Wu, Futai Zou

Abstract: Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much atte… ▽ More Arbitrary text appearance poses a great challenge in scene text recognition tasks. Existing works mostly handle with the problem in consideration of the shape distortion, including perspective distortions, line curvature or other style variations. Therefore, methods based on spatial transformers are extensively studied. However, chromatic difficulties in complex scenes have not been paid much attention on. In this work, we introduce a new learnable geometric-unrelated module, the Structure-Preserving Inner Offset Network (SPIN), which allows the color manipulation of source data within the network. This differentiable module can be inserted before any recognition architecture to ease the downstream tasks, giving neural networks the ability to actively transform input intensity rather than the existing spatial rectification. It can also serve as a complementary module to known spatial transformations and work in both independent and collaborative ways with them. Extensive experiments show that the use of SPIN results in a significant improvement on multiple text recognition benchmarks compared to the state-of-the-arts. △ Less

Submitted 25 October, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: Accepted to AAAI21. Code is available at https://davar-lab.github.io/publication.html or https://github.com/hikopensource/DAVAR-Lab-OCR

arXiv:1908.02422 [pdf, other]

Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization

Authors: Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Yi Niu, Shiliang Pu, Fei Wu, Futai Zou

Abstract: Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizi… ▽ More Temporal action localization is an important yet challenging research topic due to its various applications. Since the frame-level or segment-level annotations of untrimmed videos require amounts of labor expenditure, studies on the weakly-supervised action detection have been springing up. However, most of existing frameworks rely on Class Activation Sequence (CAS) to localize actions by minimizing the video-level classification loss, which exploits the most discriminative parts of actions but ignores the minor regions. In this paper, we propose a novel weakly-supervised framework by adversarial learning of two modules for eliminating such demerits. Specifically, the first module is designed as a well-designed Seeded Sequence Growing (SSG) Network for progressively extending seed regions (namely the highly reliable regions initialized by a CAS-based framework) to their expected boundaries. The second module is a specific classifier for mining trivial or incomplete action regions, which is trained on the shared features after erasing the seeded regions activated by SSG. In this way, a whole network composed of these two modules can be trained in an adversarial manner. The goal of the adversary is to mine features that are difficult for the action classifier. That is, erasion from SSG will force the classifier to discover minor or even new action regions on the input feature sequence, and the classifier will drive the seeds to grow, alternately. At last, we could obtain the action locations and categories from the well-trained SSG and the classifier. Extensive experiments on two public benchmarks THUMOS'14 and ActivityNet1.3 demonstrate the impressive performance of our proposed method compared with the state-of-the-arts. △ Less

Submitted 6 August, 2019; originally announced August 2019.

Comments: To be appeared in ACM MM2019

arXiv:1904.09290 [pdf, other]

FeatherNets: Convolutional Neural Networks as Light as Feather for Face Anti-spoofing

Authors: Peng Zhang, Fuhao Zou, Zhiwen Wu, Nengli Dai, Skarpness Mark, Michael Fu, Juan Zhao, Kai Li

Abstract: Face Anti-spoofing gains increased attentions recently in both academic and industrial fields. With the emergence of various CNN based solutions, the multi-modal(RGB, depth and IR) methods based CNN showed better performance than single modal classifiers. However, there is a need for improving the performance and reducing the complexity. Therefore, an extreme light network architecture(FeatherNet… ▽ More Face Anti-spoofing gains increased attentions recently in both academic and industrial fields. With the emergence of various CNN based solutions, the multi-modal(RGB, depth and IR) methods based CNN showed better performance than single modal classifiers. However, there is a need for improving the performance and reducing the complexity. Therefore, an extreme light network architecture(FeatherNet A/B) is proposed with a streaming module which fixes the weakness of Global Average Pooling and uses less parameters. Our single FeatherNet trained by depth image only, provides a higher baseline with 0.00168 ACER, 0.35M parameters and 83M FLOPS. Furthermore, a novel fusion procedure with ``ensemble + cascade'' structure is presented to satisfy the performance preferred use cases. Meanwhile, the MMFD dataset is collected to provide more attacks and diversity to gain better generalization. We use the fusion method in the Face Anti-spoofing Attack Detection Challenge@CVPR2019 and got the result of 0.0013(ACER), 0.999(TPR@FPR=10e-2), 0.998(TPR@FPR=10e-3) and 0.9814(TPR@FPR=10e-4). △ Less

Submitted 22 April, 2019; originally announced April 2019.

Comments: 10 pages;6 figures

arXiv:1901.00275 [pdf, other]

doi 10.1016/j.future.2019.04.033

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Authors: Wei Chen, **cai Chen, Fuhao Zou, Yuan-Fang Li, ** Lu, Qiang Wang, Wei Zhao

Abstract: Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually characterized by their specific indexing structures, including the inverted index and the inverted multi-index structure. The inverted index structure is amenable to… ▽ More Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually characterized by their specific indexing structures, including the inverted index and the inverted multi-index structure. The inverted index structure is amenable to GPU-based implementations, and the state-of-the-art systems such as Faiss are able to exploit the massive parallelism offered by GPUs. However, the inverted index requires high memory overhead to index the dataset effectively. The inverted multi-index structure is difficult to implement for GPUs, and also ineffective in dealing with database with different data distributions. In this paper we propose a novel hierarchical inverted index structure generated by vector and line quantization methods. Our quantization method improves both search efficiency and accuracy, while maintaining comparable memory consumption. This is achieved by reducing search space and increasing the number of indexed regions. We introduce a new ANN search system, VLQ-ADC, that is based on the proposed inverted index, and perform extensive evaluation on two public billion-scale benchmark datasets SIFT1B and DEEP1B. Our evaluation shows that VLQ-ADC significantly outperforms the state-of-the-art GPU- and CPU-based systems in terms of both accuracy and search speed. The source code of VLQ-ADC is available at https://github.com/zjuchenwei/vector-line-quantization. △ Less

Submitted 18 April, 2019; v1 submitted 2 January, 2019; originally announced January 2019.

Comments: Accepted by Future Generation Computer Systems (FGCS)

arXiv:1811.09358 [pdf, ps, other]

A Sufficient Condition for Convergences of Adam and RMSProp

Authors: Fangyu Zou, Li Shen, Zequn Jie, Weizhong Zhang, Wei Liu

Abstract: Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, etc., have… ▽ More Adam and RMSProp are two of the most influential adaptive stochastic algorithms for training deep neural networks, which have been pointed out to be divergent even in the convex setting via a few simple counterexamples. Many attempts, such as decreasing an adaptive learning rate, adopting a big batch size, incorporating a temporal decorrelation technique, seeking an analogous surrogate, etc., have been tried to promote Adam/RMSProp-type algorithms to converge. In contrast with existing approaches, we introduce an alternative easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of generic Adam/RMSProp for solving large-scale non-convex stochastic optimization. Moreover, we show that the convergences of several variants of Adam, such as AdamNC, AdaEMA, etc., can be directly implied via the proposed sufficient condition in the non-convex setting. In addition, we illustrate that Adam is essentially a specifically weighted AdaGrad with exponential moving average momentum, which provides a novel perspective for understanding Adam and RMSProp. This observation coupled with this sufficient condition gives much deeper interpretations on their divergences. At last, we validate the sufficient condition by applying Adam and RMSProp to tackle a certain counterexample and train deep neural networks. Numerical results are exactly in accord with our theoretical analysis. △ Less

Submitted 24 June, 2019; v1 submitted 22 November, 2018; originally announced November 2018.

Comments: Accepted by CVPR2019 as an Oral presentation

arXiv:1808.03408 [pdf, ps, other]

A Unified Analysis of AdaGrad with Weighted Aggregation and Momentum Acceleration

Authors: Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

Abstract: Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult non-convex stochastic setting. To fill this gap, we propo… ▽ More Integrating adaptive learning rate and momentum techniques into SGD leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, \textit{etc}. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult non-convex stochastic setting. To fill this gap, we propose \emph{weighted AdaGrad with unified momentum}, dubbed AdaUSM, which has the main characteristics that (1) it incorporates a unified momentum scheme which covers both the heavy ball momentum and the Nesterov accelerated gradient momentum; (2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the non-convex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Lastly, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out. △ Less

Submitted 15 May, 2023; v1 submitted 10 August, 2018; originally announced August 2018.

Comments: IEEE TNNLS

Showing 1–15 of 15 results for author: Zou, F