Search | arXiv e-print repository

Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14359 [pdf, other]

Learning to Transfer for Evolutionary Multitasking

Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited number of evolution operators and insufficient utilization of evolutionary states for performing KT. This results in suboptimal exploitation of implicit KT's potential to tackle a variety of MTOPs. To overcome these limitations, we propose a novel Learning to Transfer (L2T) framework to automatically discover efficient KT policies for the MTOPs at hand. Our framework conceptualizes the KT process as a learning agent's sequence of strategic decisions within the EMT process. We propose an action formulation for deciding when and how to transfer, a state representation with informative features of evolution states, a reward formulation concerning convergence and transfer efficiency gain, and the environment for the agent to interact with MTOPs. We employ an actor-critic network structure for the agent and learn it via proximal policy optimization. This learned agent can be integrated with various evolutionary algorithms, enhancing their ability to address a range of new MTOPs. Comprehensive empirical studies on both synthetic and real-world MTOPs, encompassing diverse inter-task relationships, function classes, and task distributions are conducted to validate the proposed L2T framework. The results show a marked improvement in the adaptability and performance of implicit EMT when solving a wide spectrum of unseen MTOPs. △ Less

Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2405.10620 [pdf, other]

MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains

Authors: Zhaohuan Zhan, Lisha Yu, Sijie Yu, Guang Tan

Abstract: In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high training costs and lack of interpretability. Recently, Large Language Models (LLMs) have emerged as a promising tool for VLN due to their strong generalization cap… ▽ More In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high training costs and lack of interpretability. Recently, Large Language Models (LLMs) have emerged as a promising tool for VLN due to their strong generalization capabilities. However, existing LLM-based methods face limitations in memory construction and diversity of navigation strategies. To address these challenges, we propose a suite of techniques. Firstly, we introduce a method to maintain a topological map that stores navigation history, retaining information about viewpoints, objects, and their spatial relationships. This map also serves as a global action space. Additionally, we present a Navigation Chain of Thoughts module, leveraging human navigation examples to enrich navigation strategy diversity. Finally, we establish a pipeline that integrates navigational memory and strategies with perception and action prediction modules. Experimental results on the REVERIE and R2R datasets show that our method effectively enhances the navigation ability of the LLM and improves the interpretability of navigation reasoning. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.08151 [pdf, other]

Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness

Authors: Mingchen Li, Zaifu Zhan, Han Yang, Yongkang Xiao, Jiatan Huang, Rui Zhang

Abstract: Large language models (LLM) have demonstrated remarkable capabilities in various biomedical natural language processing (NLP) tasks, leveraging the demonstration within the input context to adapt to new tasks. However, LLM is sensitive to the selection of demonstrations. To address the hallucination issue inherent in LLM, retrieval-augmented LLM (RAL) offers a solution by retrieving pertinent info… ▽ More Large language models (LLM) have demonstrated remarkable capabilities in various biomedical natural language processing (NLP) tasks, leveraging the demonstration within the input context to adapt to new tasks. However, LLM is sensitive to the selection of demonstrations. To address the hallucination issue inherent in LLM, retrieval-augmented LLM (RAL) offers a solution by retrieving pertinent information from an established database. Nonetheless, existing research work lacks rigorous evaluation of the impact of retrieval-augmented large language models on different biomedical NLP tasks. This deficiency makes it challenging to ascertain the capabilities of RAL within the biomedical domain. Moreover, the outputs from RAL are affected by retrieving the unlabeled, counterfactual, or diverse knowledge that is not well studied in the biomedical domain. However, such knowledge is common in the real world. Finally, exploring the self-awareness ability is also crucial for the RAL system. So, in this paper, we systematically investigate the impact of RALs on 5 different biomedical tasks (triple extraction, link prediction, classification, question answering, and natural language inference). We analyze the performance of RALs in four fundamental abilities, including unlabeled robustness, counterfactual robustness, diverse robustness, and negative awareness. To this end, we proposed an evaluation framework to assess the RALs' performance on different biomedical NLP tasks and establish four different testbeds based on the aforementioned fundamental abilities. Then, we evaluate 3 representative LLMs with 3 different retrievers on 5 tasks over 9 datasets. △ Less

Submitted 16 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07530 [pdf, other]

Prompt-based Code Completion via Multi-Retrieval Augmented Generation

Authors: Hanzhuo Tan, Qi Luo, Ling Jiang, Zizheng Zhan, **g Li, Haotian Zhang, Yuqun Zhang

Abstract: Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence issues and hallucinations when dealing with complex code logic or extrapolating beyond their training data. Existing Retrieval Augmented Generation (RAG) technique… ▽ More Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence issues and hallucinations when dealing with complex code logic or extrapolating beyond their training data. Existing Retrieval Augmented Generation (RAG) techniques partially address these issues by retrieving relevant code with a separate encoding model where the retrieved snippet serves as contextual reference for code completion. However, their retrieval scope is subject to a singular perspective defined by the encoding model, which largely overlooks the complexity and diversity inherent in code semantics. To address this limitation, we propose ProCC, a code completion framework leveraging prompt engineering and the contextual multi-armed bandits algorithm to flexibly incorporate and adapt to multiple perspectives of code. ProCC first employs a prompt-based multi-retriever system which crafts prompt templates to elicit LLM knowledge to understand code semantics with multiple retrieval perspectives. Then, it adopts the adaptive retrieval selection algorithm to incorporate code similarity into the decision-making process to determine the most suitable retrieval perspective for the LLM to complete the code. Experimental results demonstrate that ProCC outperforms state-of-the-art code completion technique by 8.6% on our collected open-source benchmark suite and 10.1% on the private-domain benchmark suite collected from a billion-user e-commerce company in terms of Exact Match. ProCC also allows augmenting fine-tuned techniques in a plug-and-play manner, yielding 5.6% improvement over our studied fine-tuned model. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2403.05016 [pdf, other]

DiffClass: Diffusion-Based Class Incremental Learning

Authors: Zichong Meng, Jie Zhang, Changdi Yang, Zheng Zhan, Pu Zhao, Yanzhi WAng

Abstract: Class Incremental Learning (CIL) is challenging due to catastrophic forgetting. On top of that, Exemplar-free Class Incremental Learning is even more challenging due to forbidden access to previous task data. Recent exemplar-free CIL methods attempt to mitigate catastrophic forgetting by synthesizing previous task data. However, they fail to overcome the catastrophic forgetting due to the inabilit… ▽ More Class Incremental Learning (CIL) is challenging due to catastrophic forgetting. On top of that, Exemplar-free Class Incremental Learning is even more challenging due to forbidden access to previous task data. Recent exemplar-free CIL methods attempt to mitigate catastrophic forgetting by synthesizing previous task data. However, they fail to overcome the catastrophic forgetting due to the inability to deal with the significant domain gap between real and synthetic data. To overcome these issues, we propose a novel exemplar-free CIL method. Our method adopts multi-distribution matching (MDM) diffusion models to unify quality and bridge domain gaps among all domains of training data. Moreover, our approach integrates selective synthetic image augmentation (SSIA) to expand the distribution of the training data, thereby improving the model's plasticity and reinforcing the performance of our method's ultimate component, multi-domain adaptation (MDA). With the proposed integrations, our method then reformulates exemplar-free CIL into a multi-domain adaptation problem to implicitly address the domain gap problem to enhance model stability during incremental training. Extensive experiments on benchmark class incremental datasets and settings demonstrate that our method excels previous exemplar-free CIL methods and achieves state-of-the-art performance. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2402.11423 [pdf, other]

VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger

Authors: Zihao Zhan, Yirui Yang, Haoqi Shan, Hanqiu Wang, Yier **, Shuo Wang

Abstract: Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vec… ▽ More Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vectors, enabling adversaries to manipulate the charger and perform a series of attacks. In this paper, we propose VoltSchemer, a set of innovative attacks that grant attackers control over commercial-off-the-shelf wireless chargers merely by modulating the voltage from the power supply. These attacks represent the first of its kind, exploiting voltage noises from the power supply to manipulate wireless chargers without necessitating any malicious modifications to the chargers themselves. The significant threats imposed by VoltSchemer are substantiated by three practical attacks, where a charger can be manipulated to: control voice assistants via inaudible voice commands, damage devices being charged through overcharging or overheating, and bypass Qi-standard specified foreign-object-detection mechanism to damage valuable items exposed to intense magnetic fields. We demonstrate the effectiveness and practicality of the VoltSchemer attacks with successful attacks on 9 top-selling COTS wireless chargers. Furthermore, we discuss the security implications of our findings and suggest possible countermeasures to mitigate potential threats. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: This paper has been accepted by the 33rd USENIX Security Symposium

arXiv:2402.02227 [pdf, other]

doi 10.1109/SP46214.2022.9833718

Invisible Finger: Practical Electromagnetic Interference Attack on Touchscreen-based Electronic Devices

Authors: Haoqi Shan, Boyi Zhang, Zihao Zhan, Dean Sullivan, Shuo Wang, Yier **

Abstract: Touchscreen-based electronic devices such as smart phones and smart tablets are widely used in our daily life. While the security of electronic devices have been heavily investigated recently, the resilience of touchscreens against various attacks has yet to be thoroughly investigated. In this paper, for the first time, we show that touchscreen-based electronic devices are vulnerable to intentiona… ▽ More Touchscreen-based electronic devices such as smart phones and smart tablets are widely used in our daily life. While the security of electronic devices have been heavily investigated recently, the resilience of touchscreens against various attacks has yet to be thoroughly investigated. In this paper, for the first time, we show that touchscreen-based electronic devices are vulnerable to intentional electromagnetic interference (IEMI) attacks in a systematic way and how to conduct this attack in a practical way. Our contribution lies in not just demonstrating the attack, but also analyzing and quantifying the underlying mechanism allowing the novel IEMI attack on touchscreens in detail. We show how to calculate both the minimum amount of electric field and signal frequency required to induce touchscreen ghost touches. We further analyze our IEMI attack on real touchscreens with different magnitudes, frequencies, duration, and multitouch patterns. The mechanism of controlling the touchscreen-enabled electronic devices with IEMI signals is also elaborated. We design and evaluate an out-of-sight touchscreen locator and touch injection feedback mechanism to assist a practical IEMI attack. Our attack works directly on the touchscreen circuit regardless of the touchscreen scanning mechanism or operating system. Our attack can inject short-tap, long-press, and omni-directional gestures on touchscreens from a distance larger than the average thickness of common tabletops. Compared with the state-of-the-art touchscreen attack, ours can accurately inject different types of touch events without the need for sensing signal synchronization, which makes our attack more robust and practical. In addition, rather than showing a simple proof-of-concept attack, we present and demonstrate the first ready-to-use IEMI based touchscreen attack vector with end-to-end attack scenarios. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: This paper has been accepted by 2022 IEEE Symposium on Security and Privacy (SP) and won distinguished paper award

arXiv:2401.12193 [pdf, other]

Programmable EM Sensor Array for Golden-Model Free Run-time Trojan Detection and Localization

Authors: Hanqiu Wang, Max Panoff, Zihao Zhan, Shuo Wang, Christophe Bobda, Domenic Forte

Abstract: Side-channel analysis has been proven effective at detecting hardware Trojans in integrated circuits (ICs). However, most detection techniques rely on large external probes and antennas for data collection and require a long measurement time to detect Trojans. Such limitations make these techniques impractical for run-time deployment and ineffective in detecting small Trojans with subtle side-chan… ▽ More Side-channel analysis has been proven effective at detecting hardware Trojans in integrated circuits (ICs). However, most detection techniques rely on large external probes and antennas for data collection and require a long measurement time to detect Trojans. Such limitations make these techniques impractical for run-time deployment and ineffective in detecting small Trojans with subtle side-channel signatures. To overcome these challenges, we propose a Programmable Sensor Array (PSA) for run-time hardware Trojan detection, localization, and identification. PSA is a tampering-resilient integrated on-chip magnetic field sensor array that can be re-programmed to change the sensors' shape, size, and location. Using PSA, EM side-channel measurement results collected from sensors at different locations on an IC can be analyzed to localize and identify the Trojan. The PSA has better performance than conventional external magnetic probes and state-of-the-art on-chip single-coil magnetic field sensors. We fabricated an AES-128 test chip with four AES Hardware Trojans. They were successfully detected, located, and identified with the proposed on-chip PSA within 10 milliseconds using our proposed cross-domain analysis. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 6 pages, 5 figures, Accepted at DATE2024

arXiv:2401.06127 [pdf, other]

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

Authors: Yifan Gong, Zheng Zhan, Qing **, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

Abstract: One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with… ▽ More One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient? To achieve this goal, we propose a series of innovative techniques. First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch. Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time. Extensive experiments show that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkably reduced training and storage costs for each concept. △ Less

Submitted 2 June, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: ICML 2024. Project Page: https://yifanfanfanfan.github.io/e2gan/

arXiv:2312.02141 [pdf, other]

iMatching: Imperative Correspondence Learning

Authors: Zitong Zhan, Dasong Gao, Yun-Jou Lin, Youjie Xia, Chen Wang

Abstract: Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme… ▽ More Learning feature correspondence is a foundational task in computer vision, holding immense importance for downstream applications such as visual odometry and 3D reconstruction. Despite recent progress in data-driven models, feature correspondence learning is still limited by the lack of accurate per-pixel correspondence labels. To overcome this difficulty, we introduce a new self-supervised scheme, imperative learning (IL), for training feature correspondence. It enables correspondence learning on arbitrary uninterrupted videos without any camera pose or depth labels, heralding a new era for self-supervised correspondence learning. Specifically, we formulated the problem of correspondence learning as a bilevel optimization, which takes the reprojection error from bundle adjustment as a supervisory signal for the model. To avoid large memory and computation overhead, we leverage the stationary point to effectively back-propagate the implicit gradients through bundle adjustment. Through extensive experiments, we demonstrate superior performance on tasks including feature matching and pose estimation, in which we obtained an average of 30% accuracy gain over the state-of-the-art matching models. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2310.03749 [pdf]

SCVCNet: Sliding cross-vector convolution network for cross-task and inter-individual-set EEG-based cognitive workload recognition

Authors: Qi Wang, Li Chen, Zhiyuan Zhan, Jianhua Zhang, Zhong Yin

Abstract: This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. Th… ▽ More This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. The SCVCNet utilizes a sliding cross-vector convolution (SCVC) operation, where paired input layers representing the theta and alpha power are employed. By extracting the weights from a kernel matrix's central row and column, we compute the weighted sum of the two vectors around a specified scalp location. Next, we introduce an inter-frequency-point feature integration module to fuse the SCVC feature maps. Finally, we combined the two modules with the output-channel pooling and classification layers to construct the model. To train the SCVCNet, we employ the regularized least-square method with ridge regression and the extreme learning machine theory. We validate its performance using three databases, each consisting of distinct tasks performed by independent participant groups. The average accuracy (0.6813 and 0.6229) and F1 score (0.6743 and 0.6076) achieved in two different validation paradigms show partially higher performance than the previous works. All features and algorithms are available on website:https://github.com/7ohnKeats/SCVCNet. △ Less

Submitted 21 September, 2023; originally announced October 2023.

Comments: 12 pages

arXiv:2309.13035 [pdf, other]

PyPose v0.6: The Imperative Programming Interface for Robotics

Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, **gnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11883 [pdf]

On-the-Fly SfM: What you capture is What you get

Authors: Zongqian Zhan, Rui Xia, Yifei Yu, Yibo Xu, Xin Wang

Abstract: Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud. In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken… ▽ More Over the last decades, ample achievements have been made on Structure from motion (SfM). However, the vast majority of them basically work in an offline manner, i.e., images are firstly captured and then fed together into a SfM pipeline for obtaining poses and sparse point cloud. In this work, on the contrary, we present an on-the-fly SfM: running online SfM while image capturing, the newly taken On-the-Fly image is online estimated with the corresponding pose and points, i.e., what you capture is what you get. Specifically, our approach firstly employs a vocabulary tree that is unsupervised trained using learning-based global features for fast image retrieval of newly fly-in image. Then, a robust feature matching mechanism with least squares (LSM) is presented to improve image registration performance. Finally, via investigating the influence of newly fly-in image's connected neighboring images, an efficient hierarchical weighted local bundle adjustment (BA) is used for optimization. Extensive experimental results demonstrate that on-the-fly SfM can meet the goal of robustly registering the images while capturing in an online way. △ Less

Submitted 13 February, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.10619 [pdf, other]

centroIDA: Cross-Domain Class Discrepancy Minimization Based on Accumulative Class-Centroids for Imbalanced Domain Adaptation

Authors: Xiaona Sun, Zhenyu Wu, Yichen Liu, Saier Hu, Zhiqiang Zhan, Yang Ji

Abstract: Unsupervised Domain Adaptation (UDA) approaches address the covariate shift problem by minimizing the distribution discrepancy between the source and target domains, assuming that the label distribution is invariant across domains. However, in the imbalanced domain adaptation (IDA) scenario, covariate and long-tailed label shifts both exist across domains. To tackle the IDA problem, some current r… ▽ More Unsupervised Domain Adaptation (UDA) approaches address the covariate shift problem by minimizing the distribution discrepancy between the source and target domains, assuming that the label distribution is invariant across domains. However, in the imbalanced domain adaptation (IDA) scenario, covariate and long-tailed label shifts both exist across domains. To tackle the IDA problem, some current research focus on minimizing the distribution discrepancies of each corresponding class between source and target domains. Such methods rely much on the reliable pseudo labels' selection and the feature distributions estimation for target domain, and the minority classes with limited numbers makes the estimations more uncertainty, which influences the model's performance. In this paper, we propose a cross-domain class discrepancy minimization method based on accumulative class-centroids for IDA (centroIDA). Firstly, class-based re-sampling strategy is used to obtain an unbiased classifier on source domain. Secondly, the accumulative class-centroids alignment loss is proposed for iterative class-centroids alignment across domains. Finally, class-wise feature alignment loss is used to optimize the feature representation for a robust classification boundary. A series of experiments have proved that our method outperforms other SOTA methods on IDA problem, especially with the increasing degree of label shift. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.08366 [pdf, other]

Dual-Branch Temperature Scaling Calibration for Long-Tailed Recognition

Authors: Jialin Guo, Zhenyu Wu, Zhiqiang Zhan, Yang Ji

Abstract: The calibration for deep neural networks is currently receiving widespread attention and research. Miscalibration usually leads to overconfidence of the model. While, under the condition of long-tailed distribution of data, the problem of miscalibration is more prominent due to the different confidence levels of samples in minority and majority categories, and it will result in more serious overco… ▽ More The calibration for deep neural networks is currently receiving widespread attention and research. Miscalibration usually leads to overconfidence of the model. While, under the condition of long-tailed distribution of data, the problem of miscalibration is more prominent due to the different confidence levels of samples in minority and majority categories, and it will result in more serious overconfidence. To address this problem, some current research have designed diverse temperature coefficients for different categories based on temperature scaling (TS) method. However, in the case of rare samples in minority classes, the temperature coefficient is not generalizable, and there is a large difference between the temperature coefficients of the training set and the validation set. To solve this challenge, this paper proposes a dual-branch temperature scaling calibration model (Dual-TS), which considers the diversities in temperature parameters of different categories and the non-generalizability of temperature parameters for rare samples in minority classes simultaneously. Moreover, we noticed that the traditional calibration evaluation metric, Excepted Calibration Error (ECE), gives a higher weight to low-confidence samples in the minority classes, which leads to inaccurate evaluation of model calibration. Therefore, we also propose Equal Sample Bin Excepted Calibration Error (Esbin-ECE) as a new calibration evaluation metric. Through experiments, we demonstrate that our model yields state-of-the-art in both traditional ECE and Esbin-ECE metrics. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2306.04366 [pdf, other]

doi 10.1109/TMC.2024.3373469

Enhancing Worker Recruitment in Collaborative Mobile Crowdsourcing: A Graph Neural Network Trust Evaluation Approach

Authors: Zhongwei Zhan, Yingjie Wang, Peiyong Duan, Akshita Maradapu Vera Venkata Sai, Zhaowei Liu, Chaocan Xiang, Xiangrong Tong, Weilong Wang, Zhipeng Cai

Abstract: Collaborative Mobile Crowdsourcing (CMCS) allows platforms to recruit worker teams to collaboratively execute complex sensing tasks. The efficiency of such collaborations could be influenced by trust relationships among workers. To obtain the asymmetric trust values among all workers in the social network, the Trust Reinforcement Evaluation Framework (TREF) based on Graph Convolutional Neural Netw… ▽ More Collaborative Mobile Crowdsourcing (CMCS) allows platforms to recruit worker teams to collaboratively execute complex sensing tasks. The efficiency of such collaborations could be influenced by trust relationships among workers. To obtain the asymmetric trust values among all workers in the social network, the Trust Reinforcement Evaluation Framework (TREF) based on Graph Convolutional Neural Networks (GCNs) is proposed in this paper. The task completion effect is comprehensively calculated by considering the workers' ability benefits, distance benefits, and trust benefits in this paper. The worker recruitment problem is modeled as an Undirected Complete Recruitment Graph (UCRG), for which a specific Tabu Search Recruitment (TSR) algorithm solution is proposed. An optimal execution team is recruited for each task by the TSR algorithm, and the collaboration team for the task is obtained under the constraint of privacy loss. To enhance the efficiency of the recruitment algorithm on a large scale and scope, the Mini-Batch K-Means clustering algorithm and edge computing technology are introduced, enabling distributed worker recruitment. Lastly, extensive experiments conducted on five real datasets validate that the recruitment algorithm proposed in this paper outperforms other baselines. Additionally, TREF proposed herein surpasses the performance of state-of-the-art trust evaluation methods in the literature. △ Less

Submitted 21 March, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: The article has been accepted by IEEE TMC, and its DOI is 10.1109/TMC.2024.3373469

arXiv:2305.00380 [pdf, other]

DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning

Authors: Zifeng Wang, Zheng Zhan, Yifan Gong, Yucai Shao, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

Abstract: Rehearsal-based approaches are a mainstay of continual learning (CL). They mitigate the catastrophic forgetting problem by maintaining a small fixed-size buffer with a subset of data from past tasks. While most rehearsal-based approaches study how to effectively exploit the knowledge from the buffered past data, little attention is paid to the inter-task relationships with the critical task-specif… ▽ More Rehearsal-based approaches are a mainstay of continual learning (CL). They mitigate the catastrophic forgetting problem by maintaining a small fixed-size buffer with a subset of data from past tasks. While most rehearsal-based approaches study how to effectively exploit the knowledge from the buffered past data, little attention is paid to the inter-task relationships with the critical task-specific and task-invariant knowledge. By appropriately leveraging inter-task relationships, we propose a novel CL method named DualHSIC to boost the performance of existing rehearsal-based methods in a simple yet effective way. DualHSIC consists of two complementary components that stem from the so-called Hilbert Schmidt independence criterion (HSIC): HSIC-Bottleneck for Rehearsal (HBR) lessens the inter-task interference and HSIC Alignment (HA) promotes task-invariant knowledge sharing. Extensive experiments show that DualHSIC can be seamlessly plugged into existing rehearsal-based methods for consistent performance improvements, and also outperforms recent state-of-the-art regularization-enhanced rehearsal methods. Source code will be released. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Accepted at ICML 2023 as a conference paper

arXiv:2304.12825 [pdf, other]

GraphVF: Controllable Protein-Specific 3D Molecule Generation with Variational Flow

Authors: Fang Sun, Zhihao Zhan, Hongyu Guo, Ming Zhang, Jian Tang

Abstract: Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent models leverage geometric constraints to generate ligand molecules that bind cohesively with specific protein pockets. However, these models cannot effectively generate 3D molecules with 2D skeletal curtailments and property constraints, which are pivotal to drug potency and development. To ta… ▽ More Designing molecules that bind to specific target proteins is a fundamental task in drug discovery. Recent models leverage geometric constraints to generate ligand molecules that bind cohesively with specific protein pockets. However, these models cannot effectively generate 3D molecules with 2D skeletal curtailments and property constraints, which are pivotal to drug potency and development. To tackle this challenge, we propose GraphVF, a variational flow-based framework that combines 2D topology and 3D geometry, for controllable generation of binding 3D molecules. Empirically, our method achieves state-of-the-art binding affinity and realistic sub-structural layouts for protein-specific generation. In particular, GraphVF represents the first controllable geometry-aware, protein-specific molecule generation method, which can generate binding 3D molecules with tailored sub-structures and physio-chemical properties. Our code is available at https://github.com/Franco-Solis/GraphVF-code. △ Less

Submitted 23 February, 2023; originally announced April 2023.

Comments: 15 pages, 8 figures

arXiv:2304.12779 [pdf, ps, other]

An Approximation Algorithm for Covering Vertices by 4^+-Paths

Authors: Mingyang Gong, Zhi-Zhong Chen, Guohui Lin, Zhaohui Zhan

Abstract: This paper deals with the problem of finding a collection of vertex-disjoint paths in a given graph G=(V,E) such that each path has at least four vertices and the total number of vertices in these paths is maximized. The problem is NP-hard and admits an approximation algorithm which achieves a ratio of 2 and runs in O(|V|^8) time. The known algorithm is based on time-consuming local search, and it… ▽ More This paper deals with the problem of finding a collection of vertex-disjoint paths in a given graph G=(V,E) such that each path has at least four vertices and the total number of vertices in these paths is maximized. The problem is NP-hard and admits an approximation algorithm which achieves a ratio of 2 and runs in O(|V|^8) time. The known algorithm is based on time-consuming local search, and its authors ask whether one can design a better approximation algorithm by a completely different approach. In this paper, we answer their question in the affirmative by presenting a new approximation algorithm for the problem. Our algorithm achieves a ratio of 1.874 and runs in O(min{|E|^2|V|^2, |V|^5}) time. Unlike the previously best algorithm, ours starts with a maximum matching M of G and then tries to transform M into a solution by utilizing a maximum-weight path-cycle cover in a suitably constructed graph. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2303.03800 [pdf, other]

Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding

Authors: Jiacheng Li, Longhui Wei, ZongYuan Zhan, Xin He, Siliang Tang, Qi Tian, Yueting Zhuang

Abstract: Generative transformers have shown their superiority in synthesizing high-fidelity and high-resolution images, such as good diversity and training stability. However, they suffer from the problem of slow generation since they need to generate a long token sequence autoregressively. To better accelerate the generative transformers while kee** good generation quality, we propose Lformer, a semi-au… ▽ More Generative transformers have shown their superiority in synthesizing high-fidelity and high-resolution images, such as good diversity and training stability. However, they suffer from the problem of slow generation since they need to generate a long token sequence autoregressively. To better accelerate the generative transformers while kee** good generation quality, we propose Lformer, a semi-autoregressive text-to-image generation model. Lformer firstly encodes an image into $h{\times}h$ discrete tokens, then divides these tokens into $h$ mirrored L-shape blocks from the top left to the bottom right and decodes the tokens in a block parallelly in each step. Lformer predicts the area adjacent to the previous context like autoregressive models thus it is more stable while accelerating. By leveraging the 2D structure of image tokens, Lformer achieves faster speed than the existing transformer-based methods while kee** good generation quality. Moreover, the pretrained Lformer can edit images without the requirement for finetuning. We can roll back to the early steps for regeneration or edit the image with a bounding box and a text prompt. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.03839 [pdf, other]

Futuristic Variations and Analysis in Fundus Images Corresponding to Biological Traits

Authors: Muhammad Hassan, Hao Zhang, Ahmed Fateh Ameen, Home Wu Zeng, Shuye Ma, Wen Liang, Dingqi Shang, Jiaming Ding, Ziheng Zhan, Tsz Kwan Lam, Ming Xu, Qiming Huang, Dongmei Wu, Can Yang Zhang, Zhou You, Awiwu Ain, Pei Wu Qin

Abstract: Fundus image captures rear of an eye, and which has been studied for the diseases identification, classification, segmentation, generation, and biological traits association using handcrafted, conventional, and deep learning methods. In biological traits estimation, most of the studies have been carried out for the age prediction and gender classification with convincing results. However, the curr… ▽ More Fundus image captures rear of an eye, and which has been studied for the diseases identification, classification, segmentation, generation, and biological traits association using handcrafted, conventional, and deep learning methods. In biological traits estimation, most of the studies have been carried out for the age prediction and gender classification with convincing results. However, the current study utilizes the cutting-edge deep learning (DL) algorithms to estimate biological traits in terms of age and gender together with associating traits to retinal visuals. For the traits association, our study embeds aging as the label information into the proposed DL model to learn knowledge about the effected regions with aging. Our proposed DL models, named FAG-Net and FGC-Net, correspondingly estimate biological traits (age and gender) and generates fundus images. FAG-Net can generate multiple variants of an input fundus image given a list of ages as conditions. Our study analyzes fundus images and their corresponding association with biological traits, and predicts of possible spreading of ocular disease on fundus images given age as condition to the generative model. Our proposed models outperform the randomly selected state of-the-art DL models. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 10 pages, 4 figures, 3 tables

arXiv:2212.05122 [pdf, other]

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

Authors: Yifan Gong, Zheng Zhan, Pu Zhao, Yushu Wu, Chao Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang

Abstract: During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often c… ▽ More During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., kee** the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2211.09108 [pdf, other]

Robust Online Video Instance Segmentation with Track Queries

Authors: Zitong Zhan, Daniel McKee, Svetlana Lazebnik

Abstract: Recently, transformer-based methods have achieved impressive results on Video Instance Segmentation (VIS). However, most of these top-performing methods run in an offline manner by processing the entire video clip at once to predict instance mask volumes. This makes them incapable of handling the long videos that appear in challenging new video instance segmentation datasets like UVO and OVIS. We… ▽ More Recently, transformer-based methods have achieved impressive results on Video Instance Segmentation (VIS). However, most of these top-performing methods run in an offline manner by processing the entire video clip at once to predict instance mask volumes. This makes them incapable of handling the long videos that appear in challenging new video instance segmentation datasets like UVO and OVIS. We propose a fully online transformer-based video instance segmentation model that performs comparably to top offline methods on the YouTube-VIS 2019 benchmark and considerably outperforms them on UVO and OVIS. This method, called Robust Online Video Segmentation (ROVIS), augments the Mask2Former image instance segmentation model with track queries, a lightweight mechanism for carrying track information from frame to frame, originally introduced by the TrackFormer method for multi-object tracking. We show that, when combined with a strong enough image segmentation architecture, track queries can exhibit impressive accuracy while not being constrained to short videos. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2209.09476 [pdf, other]

SparCL: Sparse Continual Learning on the Edge

Authors: Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

Abstract: Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning… ▽ More Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: Published at NeurIPS 2022 as a conference paper

arXiv:2207.12577 [pdf, other]

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

Authors: Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

Abstract: Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this,… ▽ More Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. The inference speed is directly taken into the optimization along with the SR loss to derive SR models with high image quality while satisfying the real-time inference requirement. Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence. With the proposed framework, we achieve real-time SR inference for implementing 720p resolution with competitive SR performance (in terms of PSNR and SSIM) on GPU/DSP of mobile platforms (Samsung Galaxy S21). △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2206.02424 [pdf]

doi 10.1007/s11554-024-01436-6

Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles

Authors: Hulin Li, Jun Li, Hanbing Wei, Zheng Liu, Zhenfei Zhan, Qiliang Ren

Abstract: Object detection is a significant downstream task in computer vision. For the on-board edge computing platforms, a giant model is difficult to achieve the real-time detection requirement. And, a lightweight model built from a large number of the depth-wise separable convolution layers cannot achieve the sufficient accuracy. We introduce a new lightweight convolution technique, GSConv, to lighten t… ▽ More Object detection is a significant downstream task in computer vision. For the on-board edge computing platforms, a giant model is difficult to achieve the real-time detection requirement. And, a lightweight model built from a large number of the depth-wise separable convolution layers cannot achieve the sufficient accuracy. We introduce a new lightweight convolution technique, GSConv, to lighten the model but maintain the accuracy. The GSConv accomplishes an excellent trade-off between the model's accuracy and speed. And, we provide a design paradigm, slim-neck, to achieve a higher computational cost-effectiveness of the detectors. The effectiveness of our approach was robustly demonstrated in over twenty sets comparative experiments. In particular, the detectors of ameliorated by our approach obtains state-of-the-art results (e.g. 70.9% mAP0.5 for the SODA10M at a speed of ~ 100FPS on a Tesla T4 GPU) compared with the originals. Code is available at https://github.com/alanli1997/slim-neck-by-gsconv △ Less

Submitted 17 August, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 18 pages, 12 figures

arXiv:2205.15045 [pdf]

Intelligent optoelectronic processor for orbital angular momentum spectrum measurement

Authors: Hao Wang, Ziyu Zhan, Futai Hu, Yuan Meng, Zeqi Liu, Xing Fu, Qiang Liu

Abstract: Orbital angular momentum (OAM) detection underpins almost all aspects of vortex beams' advances such as communication and quantum analogy. Conventional schemes are frustrated by low speed, complicated system, limited detection range. Here, we devise an intelligent processor composed of photonic and electronic neurons for OAM spectrum measurement in a fast, accurate and direct manner. Specifically,… ▽ More Orbital angular momentum (OAM) detection underpins almost all aspects of vortex beams' advances such as communication and quantum analogy. Conventional schemes are frustrated by low speed, complicated system, limited detection range. Here, we devise an intelligent processor composed of photonic and electronic neurons for OAM spectrum measurement in a fast, accurate and direct manner. Specifically, optical layers extract invisible topological charge information from incoming light and a shallow electronic layer predicts the exact spectrum. The integration of optical-computing promises us a compact single-shot system with high speed and energy efficiency, neither necessitating reference wave nor repetitive steps. Importantly, our processor is endowed with salient generalization ability and robustness against diverse structured light and adverse effects. We further raise a universal model interpretation paradigm to reveal the underlying physical mechanisms in the hybrid processor, as distinct from conventional 'black-box' networks. Such interpretation algorithm can improve the detection efficiency. We also complete the theory of optoelectronic network enabling its efficient training. This work not only contributes to the explorations on OAM physics and applications, and also broadly inspires the advanced links between intelligent computing and physical effects. △ Less

Submitted 5 September, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2203.12802 [pdf]

A platform for causal knowledge representation and inference in industrial fault diagnosis based on cubic DUCG

Authors: Bu XuSong, Nie Hao, Zhang Zhan, Zhang Qin

Abstract: The working conditions of large-scale industrial systems are very complex. Once a failure occurs, it will affect industrial production, cause property damage, and even endanger the workers' lives. Therefore, it is important to control the operation of the system to accurately grasp the operation status of the system and find out the failure in time. The occurrence of system failure is a gradual pr… ▽ More The working conditions of large-scale industrial systems are very complex. Once a failure occurs, it will affect industrial production, cause property damage, and even endanger the workers' lives. Therefore, it is important to control the operation of the system to accurately grasp the operation status of the system and find out the failure in time. The occurrence of system failure is a gradual process, and the occurrence of the current system failure may depend on the previous state of the system, which is sequential. The fault diagnosis technology based on time series can monitor the operating status of the system in real-time, detect the abnormal operation of the system within the allowable time interval, diagnose the root cause of the fault and predict the status trend. In order to guide the technical personnel to troubleshoot and solve related faults, in this paper, an industrial fault diagnosis system is implemented based on the cubic DUCG theory. The diagnostic model of the system is constructed based on expert knowledge and experience. At the same time, it can perform real-time fault diagnosis based on time sequence, which solves the problem of fault diagnosis of industrial systems without sample data. △ Less

Submitted 27 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.05553 [pdf, other]

Transfer of Representations to Video Label Propagation: Implementation Factors Matter

Authors: Daniel McKee, Zitong Zhan, Bing Shuai, Davide Modolo, Joseph Tighe, Svetlana Lazebnik

Abstract: This work studies feature representations for dense label propagation in video, with a focus on recently proposed methods that learn video correspondence using self-supervised signals such as colorization or temporal cycle consistency. In the literature, these methods have been evaluated with an array of inconsistent settings, making it difficult to discern trends or compare performance fairly. St… ▽ More This work studies feature representations for dense label propagation in video, with a focus on recently proposed methods that learn video correspondence using self-supervised signals such as colorization or temporal cycle consistency. In the literature, these methods have been evaluated with an array of inconsistent settings, making it difficult to discern trends or compare performance fairly. Starting with a unified formulation of the label propagation algorithm that encompasses most existing variations, we systematically study the impact of important implementation factors in feature extraction and label propagation. Along the way, we report the accuracies of properly tuned supervised and unsupervised still image baselines, which are higher than those found in previous works. We also demonstrate that augmenting video-based correspondence cues with still-image-based ones can further improve performance. We then attempt a fair comparison of recent video-based methods on the DAVIS benchmark, showing convergence of best methods to performance levels near our strong ImageNet baseline, despite the usage of a variety of specialized video-based losses and training particulars. Additional comparisons on JHMDB and VIP datasets confirm the similar performance of current methods. We hope that this study will help to improve evaluation practices and better inform future research directions in temporal correspondence. △ Less

Submitted 10 March, 2022; originally announced March 2022.

arXiv:2111.11581 [pdf, other]

Automatic Map** of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Authors: Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

Abstract: Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-gr… ▽ More Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme map** methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme map** methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48$\times$ and 1.73$\times$ DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss. △ Less

Submitted 22 November, 2021; originally announced November 2021.

arXiv:2110.14032 [pdf, other]

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Authors: Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing **, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Abstract: Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure s… ▽ More Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. A reviewer strongly against our work based on his false assumptions and misunderstandings. On top of the previous submission, we employ data efficiency for further acceleration of sparse training. And we explore the impact of model sparsity, sparsity schemes, and sparse training algorithms on the number of removable training examples. Our codes are publicly available at: https://github.com/boone891214/MEST. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021 Spotlight Paper

arXiv:2108.08910 [pdf, other]

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

Authors: Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

Abstract: Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deploymen… ▽ More Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deployment of SISR tasks on mobile, we combine neural architecture search with pruning search and propose an automatic search framework that derives sparse super-resolution (SR) models with high image quality while satisfying the real-time inference requirement. To decrease the search cost, we leverage the weight sharing strategy by introducing a supernet and decouple the search problem into three stages, including supernet construction, compiler-aware architecture and pruning search, and compiler-aware pruning ratio search. With the proposed framework, we are the first to achieve real-time SR inference (with only tens of milliseconds per frame) for implementing 720p resolution with competitive image quality (in terms of PSNR and SSIM) on mobile platforms (Samsung Galaxy S20). △ Less

Submitted 14 February, 2023; v1 submitted 18 August, 2021; originally announced August 2021.

arXiv:2012.15531 [pdf, other]

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos

Authors: Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, **g**g Chen, Jie Liu, Yu-Gang Jiang

Abstract: Automatic colorectal polyp detection in colonoscopy video is a fundamental task, which has received a lot of attention. Manually annotating polyp region in a large scale video dataset is time-consuming and expensive, which limits the development of deep learning techniques. A compromise is to train the target model by using labeled images and infer on colonoscopy videos. However, there are several… ▽ More Automatic colorectal polyp detection in colonoscopy video is a fundamental task, which has received a lot of attention. Manually annotating polyp region in a large scale video dataset is time-consuming and expensive, which limits the development of deep learning techniques. A compromise is to train the target model by using labeled images and infer on colonoscopy videos. However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness. To address these issues, we propose an Image-video-joint polyp detection network (Ivy-Net) to address the domain gap between colonoscopy images from historical medical reports and real-time videos. In our Ivy-Net, a modified mixup is utilized to generate training data by combining the positive images and negative video frames at the pixel level, which could learn the domain adaptive representations and augment the positive samples. Simultaneously, a temporal coherence regularization (TCR) is proposed to introduce the smooth constraint on feature-level in adjacent frames and improve polyp detection by unlabeled colonoscopy videos. For evaluation, a new large colonoscopy polyp dataset is collected, which contains 3056 images from historical medical reports of 889 positive patients and 7.5-hour videos of 69 patients (28 positive). The experiments on the collected dataset demonstrate that our Ivy-Net achieves the state-of-the-art result on colonoscopy video. △ Less

Submitted 31 December, 2020; originally announced December 2020.

arXiv:2012.00596 [pdf, other]

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

Authors: Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing **, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin

Abstract: With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently and do not fully consider compiler-level optimizations which is a must-do for mobile ac… ▽ More With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently and do not fully consider compiler-level optimizations which is a must-do for mobile acceleration. In this work, we first propose (i) a general category of fine-grained structured pruning applicable to various DNN layers, and (ii) a comprehensive, compiler automatic code generation framework supporting different DNNs and different pruning schemes, which bridge the gap of model compression and NAS. We further propose NPAS, a compiler-aware unified network pruning, and architecture search. To deal with large search space, we propose a meta-modeling procedure based on reinforcement learning with fast evaluation and Bayesian optimization, ensuring the total number of training epochs comparable with representative NAS frameworks. Our framework achieves 6.7ms, 5.9ms, 3.9ms ImageNet inference times with 78.2%, 75% (MobileNet-V3 level), and 71% (MobileNet-V2 level) Top-1 accuracy respectively on an off-the-shelf mobile phone, consistently outperforming prior work. △ Less

Submitted 16 June, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: Accepted as an oral paper in the Conference on Computer Vision and Pattern Recognition (CVPR), 2021

arXiv:2008.01928 [pdf, other]

Component Divide-and-Conquer for Real-World Image Super-Resolution

Authors: Pengxu Wei, Ziwei Xie, Hannan Lu, Zongyuan Zhan, Qixiang Ye, Wangmeng Zuo, Liang Lin

Abstract: In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components. DRealSR establishes a new SR benchmark with diverse real-world degradation processes, mitigating the limitations of conventional simulated image degradation… ▽ More In this paper, we present a large-scale Diverse Real-world image Super-Resolution dataset, i.e., DRealSR, as well as a divide-and-conquer Super-Resolution (SR) network, exploring the utility of guiding SR model with low-level image components. DRealSR establishes a new SR benchmark with diverse real-world degradation processes, mitigating the limitations of conventional simulated image degradation. In general, the targets of SR vary with image regions with different low-level image components, e.g., smoothness preserving for flat regions, sharpening for edges, and detail enhancing for textures. Learning an SR model with conventional pixel-wise loss usually is easily dominated by flat regions and edges, and fails to infer realistic details of complex textures. We propose a Component Divide-and-Conquer (CDC) model and a Gradient-Weighted (GW) loss for SR. Our CDC parses an image with three components, employs three Component-Attentive Blocks (CABs) to learn attentive masks and intermediate SR predictions with an intermediate supervision learning strategy, and trains an SR model following a divide-and-conquer learning principle. Our GW loss also provides a feasible way to balance the difficulties of image components for SR. Extensive experiments validate the superior performance of our CDC and the challenging aspects of our DRealSR dataset related to diverse real-world scenarios. Our dataset and codes are publicly available at https://github.com/xiezw5/Component-Divide-and-Conquer-for-Real-World-Image-Super-Resolution △ Less

Submitted 5 August, 2020; originally announced August 2020.

Journal ref: European Conference on Computer Vision (ECCV), 2020

arXiv:2007.14592 [pdf]

A SLAM Map Restoration Algorithm Based on Submaps and an Undirected Connected Graph

Authors: Zongqian Zhan, Wenjie Jian, Yihui Li, Xin Wang, Yang Yue

Abstract: Many visual simultaneous localization and map** (SLAM) systems have been shown to be accurate and robust, and have real-time performance capabilities on both indoor and ground datasets. However, these methods can be problematic when dealing with aerial frames captured by a camera mounted on an unmanned aerial vehicle (UAV) because the flight height of the UAV can be difficult to control and is e… ▽ More Many visual simultaneous localization and map** (SLAM) systems have been shown to be accurate and robust, and have real-time performance capabilities on both indoor and ground datasets. However, these methods can be problematic when dealing with aerial frames captured by a camera mounted on an unmanned aerial vehicle (UAV) because the flight height of the UAV can be difficult to control and is easily affected by the environment.To cope with the case of lost tracking, many visual SLAM systems employ a relocalization strategy. This involves the tracking thread continuing the online working by inspecting the connections between the subsequent new frames and the generated map before the tracking was lost. To solve the missing map problem, which is an issue in many applications , after the tracking is lost, based on monocular visual SLAM, we present a method of reconstructing a complete global map of UAV datasets by sequentially merging the submaps via the corresponding undirected connected graph. Specifically, submaps are repeatedly generated, from the initialization process to the place where the tracking is lost, and a corresponding undirected connected graph is built by considering these submaps as nodes and the common map points within two submaps as edges. The common map points are then determined by the bag-of-words (BoW) method, and the submaps are merged if they are found to be connected with the online map in the undirect connected graph. To demonstrate the performance of the proposed method, we first investigated the performance on a UAV dataset, and the experimental results showed that, in the case of several tracking failures, the integrity of the map** was significantly better than that of the current mainstream SLAM method. △ Less

Submitted 29 July, 2020; originally announced July 2020.

arXiv:2007.03860 [pdf]

Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge

Authors: Zheng Li, Gang Tu, Guang Liu, Zhi-Qiang Zhan, Yi-Jian Liu

Abstract: At present, the deep end-to-end method based on supervised learning is used in entity recognition and dependency analysis. There are two problems in this method: firstly, background knowledge cannot be introduced; secondly, multi granularity and nested features of natural language cannot be recognized. In order to solve these problems, the annotation rules based on phrase window are proposed, and… ▽ More At present, the deep end-to-end method based on supervised learning is used in entity recognition and dependency analysis. There are two problems in this method: firstly, background knowledge cannot be introduced; secondly, multi granularity and nested features of natural language cannot be recognized. In order to solve these problems, the annotation rules based on phrase window are proposed, and the corresponding multi-dimensional end-to-end phrase recognition algorithm is designed. This annotation rule divides sentences into seven types of nested phrases, and indicates the dependency between phrases. The algorithm can not only introduce background knowledge, recognize all kinds of nested phrases in sentences, but also recognize the dependency between phrases. The experimental results show that the annotation rule is easy to use and has no ambiguity; the matching algorithm is more consistent with the multi granularity and diversity characteristics of syntax than the traditional end-to-end algorithm. The experiment on CPWD dataset, by introducing background knowledge, the new algorithm improves the accuracy of the end-to-end method by more than one point. The corresponding method was applied to the CCL 2018 competition and won the first place in the task of Chinese humor type recognition. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: in Chinese language

arXiv:2005.01278 [pdf, other]

A New Data Normalization Method to Improve Dialogue Generation by Minimizing Long Tail Effect

Authors: Zhiqiang Zhan, Zifeng Hou, Yang Zhang

Abstract: Recent neural models have shown significant progress in dialogue generation. Most generation models are based on language models. However, due to the Long Tail Phenomenon in linguistics, the trained models tend to generate words that appear frequently in training datasets, leading to a monotonous issue. To address this issue, we analyze a large corpus from Wikipedia and propose three frequency-bas… ▽ More Recent neural models have shown significant progress in dialogue generation. Most generation models are based on language models. However, due to the Long Tail Phenomenon in linguistics, the trained models tend to generate words that appear frequently in training datasets, leading to a monotonous issue. To address this issue, we analyze a large corpus from Wikipedia and propose three frequency-based data normalization methods. We conduct extensive experiments based on transformers and three datasets respectively collected from social media, subtitles, and the industrial application. Experimental results demonstrate significant improvements in diversity and informativeness (defined as the numbers of nouns and verbs) of generated responses. More specifically, the unigram and bigram diversity are increased by 2.6%-12.6% and 2.2%-18.9% on the three datasets, respectively. Moreover, the informativeness, i.e. the numbers of nouns and verbs, are increased by 4.0%-7.0% and 1.4%-12.1%, respectively. Additionally, the simplicity and effectiveness enable our methods to be adapted to different generation models without much extra computational cost. △ Less

Submitted 4 May, 2020; originally announced May 2020.

arXiv:2004.11250 [pdf, other]

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

Authors: Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, Bin Ren

Abstract: High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniq… ▽ More High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable real-time mobile execution of multiple DNN applications, including style transfer, DNN coloring and super resolution. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: accepted by the IJCAI-PRICAI 2020 Demonstrations Track

arXiv:2004.07561 [pdf]

AMPSO: Artificial Multi-Swarm Particle Swarm Optimization

Authors: Haohao Zhou, Zhi-Hui Zhan, Zhi-Xin Yang, Xiangzhi Wei

Abstract: In this paper we propose a novel artificial multi-swarm PSO which consists of an exploration swarm, an artificial exploitation swarm and an artificial convergence swarm. The exploration swarm is a set of equal-sized sub-swarms randomly distributed around the particles space, the exploitation swarm is artificially generated from a perturbation of the best particle of exploration swarm for a fixed p… ▽ More In this paper we propose a novel artificial multi-swarm PSO which consists of an exploration swarm, an artificial exploitation swarm and an artificial convergence swarm. The exploration swarm is a set of equal-sized sub-swarms randomly distributed around the particles space, the exploitation swarm is artificially generated from a perturbation of the best particle of exploration swarm for a fixed period of iterations, and the convergence swarm is artificially generated from a Gaussian perturbation of the best particle in the exploitation swarm as it is stagnated. The exploration and exploitation operations are alternatively carried out until the evolution rate of the exploitation is smaller than a threshold or the maximum number of iterations is reached. An adaptive inertia weight strategy is applied to different swarms to guarantee their performances of exploration and exploitation. To guarantee the accuracy of the results, a novel diversity scheme based on the positions and fitness values of the particles is proposed to control the exploration, exploitation and convergence processes of the swarms. To mitigate the inefficiency issue due to the use of diversity, two swarm update techniques are proposed to get rid of lousy particles such that nice results can be achieved within a fixed number of iterations. The effectiveness of AMPSO is validated on all the functions in the CEC2015 test suite, by comparing with a set of comprehensive set of 16 algorithms, including the most recently well-performing PSO variants and some other non-PSO optimization algorithms. △ Less

Submitted 21 June, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

arXiv:2004.05531 [pdf, other]

A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods

Authors: Tianyun Zhang, Xiaolong Ma, Zheng Zhan, Shanglin Zhou, Minghai Qin, Fei Sun, Yen-Kuang Chen, Caiwen Ding, Makan Fardad, Yanzhi Wang

Abstract: To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i.e., static regularization-based pruning and dynamic regularization-based pruning. However, the former method currently suffers either complex workloads or accuracy degradation, while the latter one takes a long… ▽ More To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i.e., static regularization-based pruning and dynamic regularization-based pruning. However, the former method currently suffers either complex workloads or accuracy degradation, while the latter one takes a long time to tune the parameters to achieve the desired pruning rate without accuracy loss. In this paper, we propose a unified DNN weight pruning framework with dynamically updated regularization terms bounded by the designated constraint, which can generate both non-structured sparsity and different kinds of structured sparsity. We also extend our method to an integrated framework for the combination of different DNN compression tasks. △ Less

Submitted 11 April, 2020; originally announced April 2020.

arXiv:2003.06745 [pdf, other]

Vision-Dialog Navigation by Exploring Cross-modal Memory

Authors: Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang

Abstract: Vision-dialog navigation posed as a new holy-grail task in vision-language disciplinary targets at learning an agent endowed with the capability of constant conversation for help with natural language and navigating according to human responses. Besides the common challenges faced in visual language navigation, vision-dialog navigation also requires to handle well with the language intentions of a… ▽ More Vision-dialog navigation posed as a new holy-grail task in vision-language disciplinary targets at learning an agent endowed with the capability of constant conversation for help with natural language and navigating according to human responses. Besides the common challenges faced in visual language navigation, vision-dialog navigation also requires to handle well with the language intentions of a series of questions about the temporal context from dialogue history and co-reasoning both dialogs and visual scenes. In this paper, we propose the Cross-modal Memory Network (CMN) for remembering and understanding the rich information relevant to historical navigation actions. Our CMN consists of two memory modules, the language memory module (L-mem) and the visual memory module (V-mem). Specifically, L-mem learns latent relationships between the current language interaction and a dialog history by employing a multi-head attention mechanism. V-mem learns to associate the current visual views and the cross-modal memory about the previous navigation actions. The cross-modal memory is generated via a vision-to-language attention and a language-to-vision attention. Benefiting from the collaborative learning of the L-mem and the V-mem, our CMN is able to explore the memory about the decision making of historical navigation actions which is for the current step. Experiments on the CVDN dataset show that our CMN outperforms the previous state-of-the-art model by a significant margin on both seen and unseen environments. △ Less

Submitted 14 March, 2020; originally announced March 2020.

Comments: CVPR2020

arXiv:2003.06513 [pdf, other]

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

Authors: Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiaolin Xu, Yanzhi Wang

Abstract: Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. To mitigate this concern, we propose a privacy-preserving-oriented pruning and mobile acceleration framewor… ▽ More Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. To mitigate this concern, we propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset. At the algorithm level of the proposed framework, a systematic weight pruning technique based on the alternating direction method of multipliers (ADMM) is designed to iteratively solve the pattern-based pruning problem for each layer with randomly generated synthetic data. In addition, corresponding optimizations at the compiler level are leveraged for inference accelerations on devices. With the proposed framework, users could avoid the time-consuming pruning process for non-experts and directly benefit from compressed models. Experimental results show that the proposed framework outperforms three state-of-art end-to-end DNN frameworks, i.e., TensorFlow-Lite, TVM, and MNN, with speedup up to 4.2X, 2.5X, and 2.0X, respectively, with almost no accuracy loss, while preserving data privacy. △ Less

Submitted 16 September, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

arXiv:2001.08839 [pdf, other]

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

Authors: Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang

Abstract: Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations. Previous works in this area leave great space for improvement since sparse structures with combinations of different structured pruning schemes are not exploited fully and efficiently. To mitigate the limitations, we propose SS-Auto, a single-shot, automatic struct… ▽ More Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations. Previous works in this area leave great space for improvement since sparse structures with combinations of different structured pruning schemes are not exploited fully and efficiently. To mitigate the limitations, we propose SS-Auto, a single-shot, automatic structured pruning framework that can achieve row pruning and column pruning simultaneously. We adopt soft constraint-based formulation to alleviate the strong non-convexity of l0-norm constraints used in state-of-the-art ADMM-based methods for faster convergence and fewer hyperparameters. Instead of solving the problem directly, a Primal-Proximal solution is proposed to avoid the pitfall of penalizing all weights equally, thereby enhancing the accuracy. Extensive experiments on CIFAR-10 and CIFAR-100 datasets demonstrate that the proposed framework can achieve ultra-high pruning rates while maintaining accuracy. Furthermore, significant inference speedup has been observed from the proposed framework through actual measurements on the smartphone. △ Less

Submitted 23 January, 2020; originally announced January 2020.

arXiv:2001.08357 [pdf, other]

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Authors: Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang

Abstract: Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance an… ▽ More Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance and limited applicability. To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for-acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference in a real-time manner. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise. △ Less

Submitted 21 February, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

arXiv:1912.00215 [pdf, other]

Probing the State of the Art: A Critical Look at Visual Representation Evaluation

Authors: Cinjon Resnick, Ze** Zhan, Joan Bruna

Abstract: Self-supervised research improved greatly over the past half decade, with much of the growth being driven by objectives that are hard to quantitatively compare. These techniques include colorization, cyclical consistency, and noise-contrastive estimation from image patches. Consequently, the field has settled on a handful of measurements that depend on linear probes to adjudicate which approaches… ▽ More Self-supervised research improved greatly over the past half decade, with much of the growth being driven by objectives that are hard to quantitatively compare. These techniques include colorization, cyclical consistency, and noise-contrastive estimation from image patches. Consequently, the field has settled on a handful of measurements that depend on linear probes to adjudicate which approaches are the best. Our first contribution is to show that this test is insufficient and that models which perform poorly (strongly) on linear classification can perform strongly (weakly) on more involved tasks like temporal activity localization. Our second contribution is to analyze the capabilities of five different representations. And our third contribution is a much needed new dataset for temporal activity localization. △ Less

Submitted 12 August, 2021; v1 submitted 30 November, 2019; originally announced December 2019.

Comments: erd59xH@Rqt!XCMsCmnz

arXiv:1909.07726 [pdf]

Building Change Detection for Remote Sensing Images Using a Dual Task Constrained Deep Siamese Convolutional Network Model

Authors: Yi Liu, Chao Pang, Zongqian Zhan, Xiaomeng Zhang, Xue Yang

Abstract: In recent years, building change detection methods have made great progress by introducing deep learning, but they still suffer from the problem of the extracted features not being discriminative enough, resulting in incomplete regions and irregular boundaries. To tackle this problem, we propose a dual task constrained deep Siamese convolutional network (DTCDSCN) model, which contains three sub-ne… ▽ More In recent years, building change detection methods have made great progress by introducing deep learning, but they still suffer from the problem of the extracted features not being discriminative enough, resulting in incomplete regions and irregular boundaries. To tackle this problem, we propose a dual task constrained deep Siamese convolutional network (DTCDSCN) model, which contains three sub-networks: a change detection network and two semantic segmentation networks. DTCDSCN can accomplish both change detection and semantic segmentation at the same time, which can help to learn more discriminative object-level features and obtain a complete change detection map. Furthermore, we introduce a dual attention module (DAM) to exploit the interdependencies between channels and spatial positions, which improves the feature representation. We also improve the focal loss function to suppress the sample imbalance problem. The experimental results obtained with the WHU building dataset show that the proposed method is effective for building change detection and achieves a state-of-the-art performance in terms of four metrics: precision, recall, F1-score, and intersection over union. △ Less

Submitted 17 September, 2019; originally announced September 2019.

arXiv:1905.08205 [pdf, other]

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

Authors: Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang

Abstract: We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis pro… ▽ More We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard. △ Less

Submitted 28 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: To appear in ACL 2019

arXiv:1812.03125 [pdf, other]

Taking the Scenic Route: Automatic Exploration for Videogames

Authors: Ze** Zhan, Batu Aytemiz, Adam M. Smith

Abstract: Machine playtesting tools and game moment search engines require exposure to the diversity of a game's state space if they are to report on or index the most interesting moments of possible play. Meanwhile, mobile app distribution services would like to quickly determine if a freshly-uploaded game is fit to be published. Having access to a semantic map of reachable states in the game would enable… ▽ More Machine playtesting tools and game moment search engines require exposure to the diversity of a game's state space if they are to report on or index the most interesting moments of possible play. Meanwhile, mobile app distribution services would like to quickly determine if a freshly-uploaded game is fit to be published. Having access to a semantic map of reachable states in the game would enable efficient inference in these applications. However, human gameplay data is expensive to acquire relative to the coverage of a game that it provides. We show that off-the-shelf automatic exploration strategies can explore with an effectiveness comparable to human gameplay on the same timescale. We contribute generic methods for quantifying exploration quality as a function of time and demonstrate our metric on several elementary techniques and human players on a collection of commercial games sampled from multiple game platforms (from Atari 2600 to Nintendo 64). Emphasizing the diversity of states reached and the semantic map extracted, this work makes productive contrast with the focus on finding a behavior policy or optimizing game score used in most automatic game playing research. △ Less

Submitted 7 December, 2018; originally announced December 2018.

Showing 1–50 of 59 results for author: Zhan, Z