Search | arXiv e-print repository

A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing

Authors: Maomao Li, Yu Li, Tianyu Yang, Yunfei Liu, Dongxu Yue, Zhihui Lin, Dong Xu

Abstract: This paper presents a video inversion approach for zero-shot video editing, which models the input video with low-rank representation during the inversion process. The existing video editing methods usually apply the typical 2D DDIM inversion or naive spatial-temporal DDIM inversion before editing, which leverages time-varying representation for each frame to derive noisy latent. Unlike most exist… ▽ More This paper presents a video inversion approach for zero-shot video editing, which models the input video with low-rank representation during the inversion process. The existing video editing methods usually apply the typical 2D DDIM inversion or naive spatial-temporal DDIM inversion before editing, which leverages time-varying representation for each frame to derive noisy latent. Unlike most existing approaches, we propose a Spatial-Temporal Expectation-Maximization (STEM) inversion, which formulates the dense video feature under an expectation-maximization manner and iteratively estimates a more compact basis set to represent the whole video. Each frame applies the fixed and global representation for inversion, which is more friendly for temporal consistency during reconstruction and editing. Extensive qualitative and quantitative experiments demonstrate that our STEM inversion can achieve consistent improvement on two state-of-the-art video editing methods. Project page: https://stem-inv.github.io/page/. △ Less

Submitted 18 June, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: 14 pages, Project page: https://stem-inv.github.io/page/

Journal ref: CVPR 2024

arXiv:2310.18021 [pdf, other]

FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving

Authors: Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Qike Huang, Xiaoxiao **, Yanjun Guo, Chenyang Mao, Yang Li, Zhe Zhu, Dengfeng Yue, Fangzhen Zhu, Yifan Wang, Yiwen Huang, Runan Wang, Cheng Qin, Zhenbing Zeng, Shaorong Xie, Xiangfeng Luo, Tuo Leng

Abstract: This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a consistent formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. Within this formal framework, we have been able to seamlessly integrate modern AI models with our formal system.… ▽ More This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a consistent formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. Within this formal framework, we have been able to seamlessly integrate modern AI models with our formal system. AI is now capable of providing deductive reasoning solutions to IMO-level plane geometry problems, just like handling other natural languages, and these proofs are readable, traceable, and verifiable. We propose the geometry formalization theory (GFT) to guide the development of the geometry formal system. Based on the GFT, we have established the FormalGeo, which consists of 88 geometric predicates and 196 theorems. It can represent, validate, and solve IMO-level geometry problems. we also have crafted the FGPS (formal geometry problem solver) in Python. It serves as both an interactive assistant for verifying problem-solving processes and an automated problem solver. We've annotated the formalgeo7k and formalgeo-imo datasets. The former contains 6,981 (expand to 133,818 through data augmentation) geometry problems, while the latter includes 18 (expand to 2,627 and continuously increasing) IMO-level challenging geometry problems. All annotated problems include detailed formal language descriptions and solutions. Implementation of the formal system and experiments validate the correctness and utility of the GFT. The backward depth-first search method only yields a 2.42% problem-solving failure rate, and we can incorporate deep learning techniques to achieve lower one. The source code of FGPS and datasets are available at https://github.com/BitSecret/FGPS. △ Less

Submitted 14 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: 44 pages

arXiv:2305.14742 [pdf, other]

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

Authors: Dongxu Yue, Qin Guo, Munan Ning, Jiaxi Cui, Yuesheng Zhu, Li Yuan

Abstract: Editing real facial images is a crucial task in computer vision with significant demand in various real-world applications. While GAN-based methods have showed potential in manipulating images especially when combined with CLIP, these methods are limited in their ability to reconstruct real images due to challenging GAN inversion capability. Despite the successful image reconstruction achieved by… ▽ More Editing real facial images is a crucial task in computer vision with significant demand in various real-world applications. While GAN-based methods have showed potential in manipulating images especially when combined with CLIP, these methods are limited in their ability to reconstruct real images due to challenging GAN inversion capability. Despite the successful image reconstruction achieved by diffusion-based methods, there are still challenges in effectively manipulating fine-gained facial attributes with textual instructions.To address these issues and facilitate convenient manipulation of real facial images, we propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model. By aligning the temporal feature of the diffusion model with the semantic condition at generative process, we introduce a stable manipulation strategy, which perform precise zero-shot manipulation effectively. Furthermore, we develop an interactive system named ChatFace, which combines the zero-shot reasoning ability of large language models to perform efficient manipulations in diffusion semantic latent space. This system enables users to perform complex multi-attribute manipulations through dialogue, opening up new possibilities for interactive image editing. Extensive experiments confirmed that our approach outperforms previous methods and enables precise editing of real facial images, making it a promising candidate for real-world applications. Project page: https://dongxuyue.github.io/chatface/ △ Less

Submitted 5 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2303.09916 [pdf]

DSDP: A Blind Docking Strategy Accelerated by GPUs

Authors: YuPeng Huang, Hong Zhang, Siyuan Jiang, Dajiong Yue, Xiaohan Lin, Jun Zhang, Yi Qin Gao

Abstract: Virtual screening, including molecular docking, plays an essential role in drug discovery. Many traditional and machine-learning based methods are available to fulfil the docking task. The traditional docking methods are normally extensively time-consuming, and their performance in blind docking remains to be improved. Although the runtime of docking based on machine learning is significantly decr… ▽ More Virtual screening, including molecular docking, plays an essential role in drug discovery. Many traditional and machine-learning based methods are available to fulfil the docking task. The traditional docking methods are normally extensively time-consuming, and their performance in blind docking remains to be improved. Although the runtime of docking based on machine learning is significantly decreased, their accuracy is still limited. In this study, we take the advantage of both traditional and machine-learning based methods, and present a method Deep Site and Docking Pose (DSDP) to improve the performance of blind docking. For the traditional blind docking, the entire protein is covered by a cube, and the initial positions of ligands are randomly generated in the cube. In contract, DSDP can predict the binding site of proteins and provide an accurate searching space and initial positions for the further conformational sampling. The docking task of DSDP makes use of the score function and a similar but modified searching strategy of AutoDock Vina, accelerated by implementation in GPUs. We systematically compare its performance with the state-of-the-art methods, including Autodock Vina, GNINA, QuickVina, SMINA, and DiffDock. DSDP reaches a 29.8% top-1 success rate (RMSD < 2 Å) on an unbiased and challenging test dataset with 1.2 s wall-clock computational time per system. Its performances on DUD-E dataset and the time-split PDBBind dataset used in EquiBind, TankBind, and DiffDock are also effective, presenting a 57.2% and 41.8% top-1 success rate with 0.8 s and 1.0 s per system, respectively. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2302.10484 [pdf, other]

Lightweight Real-time Semantic Segmentation Network with Efficient Transformer and CNN

Authors: Guoan Xu, Juncheng Li, Guangwei Gao, Huimin Lu, Jian Yang, Dong Yue

Abstract: In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation. Although CNN models have very impressive performance, the ability to capture global representation is still insufficient, which results in suboptimal results. Recently, Transformer achieved huge success in NLP tasks, demonstrating its advantages in modeling long-range dependency. Recently, Tra… ▽ More In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation. Although CNN models have very impressive performance, the ability to capture global representation is still insufficient, which results in suboptimal results. Recently, Transformer achieved huge success in NLP tasks, demonstrating its advantages in modeling long-range dependency. Recently, Transformer has also attracted tremendous attention from computer vision researchers who reformulate the image processing tasks as a sequence-to-sequence prediction but resulted in deteriorating local feature details. In this work, we propose a lightweight real-time semantic segmentation network called LETNet. LETNet combines a U-shaped CNN with Transformer effectively in a capsule embedding style to compensate for respective deficiencies. Meanwhile, the elaborately designed Lightweight Dilated Bottleneck (LDB) module and Feature Enhancement (FE) module cultivate a positive impact on training from scratch simultaneously. Extensive experiments performed on challenging datasets demonstrate that LETNet achieves superior performances in accuracy and efficiency balance. Specifically, It only contains 0.95M parameters and 13.6G FLOPs but yields 72.8\% mIoU at 120 FPS on the Cityscapes test set and 70.5\% mIoU at 250 FPS on the CamVid test dataset using a single RTX 3090 GPU. The source code will be available at https://github.com/IVIPLab/LETNet. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: IEEE Transactions on Intelligent Transportation Systems, 10 pages

arXiv:2212.12744 [pdf, ps, other]

Energy Efficiency Maximization in IRS-Aided Cell-Free Massive MIMO System

Authors: Si-Nian **, Dian-Wu Yue, Yi-Ling Chen, Qing Hu

Abstract: In this paper, we consider an intelligent reflecting surface (IRS)-aided cell-free massive multiple-input multiple-output system, where the beamforming at access points and the phase shifts at IRSs are jointly optimized to maximize energy efficiency (EE). To solve EE maximization problem, we propose an iterative optimization algorithm by using quadratic transform and Lagrangian dual transform to f… ▽ More In this paper, we consider an intelligent reflecting surface (IRS)-aided cell-free massive multiple-input multiple-output system, where the beamforming at access points and the phase shifts at IRSs are jointly optimized to maximize energy efficiency (EE). To solve EE maximization problem, we propose an iterative optimization algorithm by using quadratic transform and Lagrangian dual transform to find the optimum beamforming and phase shifts. However, the proposed algorithm suffers from high computational complexity, which hinders its application in some practical scenarios. Responding to this, we further propose a deep learning based approach for joint beamforming and phase shifts design. Specifically, a two-stage deep neural network is trained offline using the unsupervised learning manner, which is then deployed online for the predictions of beamforming and phase shifts. Simulation results show that compared with the iterative optimization algorithm and the genetic algorithm, the unsupervised learning based approach has higher EE performance and lower running time. △ Less

Submitted 24 December, 2022; originally announced December 2022.

Comments: 6 pages, 4 figures

arXiv:2112.06593 [pdf, ps, other]

RIS-Aided Cell-Free Massive MIMO Systems: Joint Design of Transmit Beamforming and Phase Shifts

Authors: Si-Nian **, Dian-Wu Yue, Ha H. Nguyen

Abstract: This paper studies RIS-aided cell-free massive MIMO systems, where multiple RISs are deployed to assist the communication between multiple access points (APs) and multiple users, with either continuous or discrete phase shifts at the RISs. We formulate the max-min fairness problem that maximizes the minimum achievable rate among all users by jointly optimizing the transmit beamforming at active AP… ▽ More This paper studies RIS-aided cell-free massive MIMO systems, where multiple RISs are deployed to assist the communication between multiple access points (APs) and multiple users, with either continuous or discrete phase shifts at the RISs. We formulate the max-min fairness problem that maximizes the minimum achievable rate among all users by jointly optimizing the transmit beamforming at active APs and the phase shifts at passive RISs, subject to power constraints at the APs. To address such a challenging problem, we first study the special single-user scenario and propose an algorithm that can transform the optimization problem into semidefinite program (SDP) or integer linear program (ILP) for the cases of continuous and discrete phase shifts, respectively. By solving the resulting SDP and ILP, we first obtain the optimal phase shifts, and then design the optimal transmit beamforming accordingly. To solve the optimization problem for the multi-user scenario and continuous phase shifts at RISs, we extend the single-user algorithm and propose an alternating optimization algorithm, which can first decompose the max-min fairness problem into two subproblems related to transmit beamforming and phase shifts, and then transform the two subproblems into second-order-cone program and SDP, respectively. For the multi-user scenario and discrete phase shifts, the max-min fairness problem is shown to be a mixed-integer non-linear program (MINLP). To tackle it, we design a ZF-based successive refinement algorithm, which can find a suboptimal transmit beamforming and phase shifts by means of alternating optimization. Numerical results show that compared with benchmark schemes of random phase shifts and without using RISs, the proposed algorithms can significantly increase the minimum achievable rate among all users, especially when the number of reflecting elements at each RIS is large. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 13 pages, 10 figures. Submitted to IEEE for possible publication

arXiv:2103.13044 [pdf, other]

MSCFNet: A Lightweight Network With Multi-Scale Context Fusion for Real-Time Semantic Segmentation

Authors: Guangwei Gao, Guoan Xu, Yi Yu, ** Xie, Jian Yang, Dong Yue

Abstract: In recent years, how to strike a good trade-off between accuracy and inference speed has become the core issue for real-time semantic segmentation applications, which plays a vital role in real-world scenarios such as autonomous driving systems and drones. In this study, we devise a novel lightweight network using a multi-scale context fusion (MSCFNet) scheme, which explores an asymmetric encoder-… ▽ More In recent years, how to strike a good trade-off between accuracy and inference speed has become the core issue for real-time semantic segmentation applications, which plays a vital role in real-world scenarios such as autonomous driving systems and drones. In this study, we devise a novel lightweight network using a multi-scale context fusion (MSCFNet) scheme, which explores an asymmetric encoder-decoder architecture to dispose this problem. More specifically, the encoder adopts some developed efficient asymmetric residual (EAR) modules, which are composed of factorization depth-wise convolution and dilation convolution. Meanwhile, instead of complicated computation, simple deconvolution is applied in the decoder to further reduce the amount of parameters while still maintaining high segmentation accuracy. Also, MSCFNet has branches with efficient attention modules from different stages of the network to well capture multi-scale contextual information. Then we combine them before the final classification to enhance the expression of the features and improve the segmentation efficiency. Comprehensive experiments on challenging datasets have demonstrated that the proposed MSCFNet, which contains only 1.15M parameters, achieves 71.9\% Mean IoU on the Cityscapes testing dataset and can run at over 50 FPS on a single Titan XP GPU configuration. △ Less

Submitted 16 July, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

Comments: IEEE Transactions on Intelligent Transportation Systems, 11 pages, 7 figures

arXiv:2012.10758 [pdf, other]

doi 10.1109/TMM.2021.3076298

Consolidated Dataset and Metrics for High-Dynamic-Range Image Quality

Authors: Aliaksei Mikhailiuk, Maria Perez-Ortiz, Dingcheng Yue, Wilson Suen, Rafal K. Mantiuk

Abstract: Increasing popularity of high-dynamic-range (HDR) image and video content brings the need for metrics that could predict the severity of image impairments as seen on displays of different brightness levels and dynamic range. Such metrics should be trained and validated on a sufficiently large subjective image quality dataset to ensure robust performance. As the existing HDR quality datasets are li… ▽ More Increasing popularity of high-dynamic-range (HDR) image and video content brings the need for metrics that could predict the severity of image impairments as seen on displays of different brightness levels and dynamic range. Such metrics should be trained and validated on a sufficiently large subjective image quality dataset to ensure robust performance. As the existing HDR quality datasets are limited in size, we created a Unified Photometric Image Quality dataset (UPIQ) with over 4,000 images by realigning and merging existing HDR and standard-dynamic-range (SDR) datasets. The realigned quality scores share the same unified quality scale across all datasets. Such realignment was achieved by collecting additional cross-dataset quality comparisons and re-scaling data with a psychometric scaling method. Images in the proposed dataset are represented in absolute photometric and colorimetric units, corresponding to light emitted from a display. We use the new dataset to retrain existing HDR metrics and show that the dataset is sufficiently large for training deep architectures. We show the utility of the dataset on brightness aware image compression. △ Less

Submitted 10 May, 2021; v1 submitted 19 December, 2020; originally announced December 2020.

arXiv:2008.00696 [pdf, other]

doi 10.1109/IEEECONF38699.2020.9389145

Heterogeneous Swarms for Maritime Dynamic Target Search and Tracking

Authors: Hian Lee Kwa, Grgur Tokić, Roland Bouffanais, Dick K. P. Yue

Abstract: Current strategies employed for maritime target search and tracking are primarily based on the use of agents following a predetermined path to perform a systematic sweep of a search area. Recently, dynamic Particle Swarm Optimization (PSO) algorithms have been used together with swarming multi-robot systems (MRS), giving search and tracking solutions the added properties of robustness, scalability… ▽ More Current strategies employed for maritime target search and tracking are primarily based on the use of agents following a predetermined path to perform a systematic sweep of a search area. Recently, dynamic Particle Swarm Optimization (PSO) algorithms have been used together with swarming multi-robot systems (MRS), giving search and tracking solutions the added properties of robustness, scalability, and flexibility. Swarming MRS also give the end-user the opportunity to incrementally upgrade the robotic system, inevitably leading to the use of heterogeneous swarming MRS. However, such systems have not been well studied and incorporating upgraded agents into a swarm may result in degraded mission performances. In this paper, we propose a PSO-based strategy using a topological k-nearest neighbor graph with tunable exploration and exploitation dynamics with an adaptive repulsion parameter. This strategy is implemented within a simulated swarm of 50 agents with varying proportions of fast agents tracking a target represented by a fictitious binary function. Through these simulations, we are able to demonstrate an increase in the swarm's collective response level and target tracking performance by substituting in a proportion of fast buoys. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: Accepted for IEEE/MTS OCEANS 2020, Singapore

Journal ref: IEEE/MTS Global Oceans 2020: Singapore - U.S. Gulf Coast, October 5-30, 2020, online, pp. 1-8

arXiv:2006.14156 [pdf, ps, other]

Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings

Authors: Liang Yu, Yi Sun, Zhanbo Xu, Chao Shen, Dong Yue, Tao Jiang, Xiaohong Guan

Abstract: In commercial buildings, about 40%-50% of the total electricity consumption is attributed to Heating, Ventilation, and Air Conditioning (HVAC) systems, which places an economic burden on building operators. In this paper, we intend to minimize the energy cost of an HVAC system in a multi-zone commercial building under dynamic pricing with the consideration of random zone occupancy, thermal comfort… ▽ More In commercial buildings, about 40%-50% of the total electricity consumption is attributed to Heating, Ventilation, and Air Conditioning (HVAC) systems, which places an economic burden on building operators. In this paper, we intend to minimize the energy cost of an HVAC system in a multi-zone commercial building under dynamic pricing with the consideration of random zone occupancy, thermal comfort, and indoor air quality comfort. Due to the existence of unknown thermal dynamics models, parameter uncertainties (e.g., outdoor temperature, electricity price, and number of occupants), spatially and temporally coupled constraints associated with indoor temperature and CO2 concentration, a large discrete solution space, and a non-convex and non-separable objective function, it is very challenging to achieve the above aim. To this end, the above energy cost minimization problem is reformulated as a Markov game. Then, an HVAC control algorithm is proposed to solve the Markov game based on multi-agent deep reinforcement learning with attention mechanism. The proposed algorithm does not require any prior knowledge of uncertain parameters and can operate without knowing building thermal dynamics models. Simulation results based on real-world traces show the effectiveness, robustness and scalability of the proposed algorithm. △ Less

Submitted 22 July, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: 14 pages, 21 figures, accepted by IEEE Transactions on Smart Grid

arXiv:2004.05691 [pdf, other]

Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization

Authors: Aliaksei Mikhailiuk, Clifford Wilmot, Maria Perez-Ortiz, Dingcheng Yue, Rafal Mantiuk

Abstract: Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e… ▽ More Pairwise comparison data arise in many domains with subjective assessment experiments, for example in image and video quality assessment. In these experiments observers are asked to express a preference between two conditions. However, many pairwise comparison protocols require a large number of comparisons to infer accurate scores, which may be unfeasible when each comparison is time-consuming (e.g. videos) or expensive (e.g. medical imaging). This motivates the use of an active sampling algorithm that chooses only the most informative pairs for comparison. In this paper we propose ASAP, an active sampling algorithm based on approximate message passing and expected information gain maximization. Unlike most existing methods, which rely on partial updates of the posterior distribution, we are able to perform full updates and therefore much improve the accuracy of the inferred scores. The algorithm relies on three techniques for reducing computational cost: inference based on approximate message passing, selective evaluations of the information gain, and selecting pairs in a batch that forms a minimum spanning tree of the inverse of information gain. We demonstrate, with real and synthetic data, that ASAP offers the highest accuracy of inferred scores compared to the existing methods. We also provide an open-source GPU implementation of ASAP for large-scale experiments. △ Less

Submitted 12 April, 2020; originally announced April 2020.

arXiv:1809.03327 [pdf, other]

YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark

Authors: Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, Thomas Huang

Abstract: Learning long-term spatial-temporal features are critical for many video analysis tasks. However, existing video segmentation methods predominantly rely on static image segmentation techniques, and methods capturing temporal dependency for segmentation have to depend on pretrained optical flow models, leading to suboptimal solutions for the problem. End-to-end sequential learning to explore spatia… ▽ More Learning long-term spatial-temporal features are critical for many video analysis tasks. However, existing video segmentation methods predominantly rely on static image segmentation techniques, and methods capturing temporal dependency for segmentation have to depend on pretrained optical flow models, leading to suboptimal solutions for the problem. End-to-end sequential learning to explore spatialtemporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i.e., even the largest video segmentation dataset only contains 90 short video clips. To solve this problem, we build a new large-scale video object segmentation dataset called YouTube Video Object Segmentation dataset (YouTube-VOS). Our dataset contains 4,453 YouTube video clips and 94 object categories. This is by far the largest video object segmentation dataset to our knowledge and has been released at http://youtube-vos.org. We further evaluate several existing state-of-the-art video object segmentation algorithms on this dataset which aims to establish baselines for the development of new algorithms in the future. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: Dataset Report. arXiv admin note: substantial text overlap with arXiv:1809.00461

arXiv:1809.00461 [pdf, other]

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

Authors: Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, Thomas Huang

Abstract: Learning long-term spatial-temporal features are critical for many video analysis tasks. However, existing video segmentation methods predominantly rely on static image segmentation techniques, and methods capturing temporal dependency for segmentation have to depend on pretrained optical flow models, leading to suboptimal solutions for the problem. End-to-end sequential learning to explore spatia… ▽ More Learning long-term spatial-temporal features are critical for many video analysis tasks. However, existing video segmentation methods predominantly rely on static image segmentation techniques, and methods capturing temporal dependency for segmentation have to depend on pretrained optical flow models, leading to suboptimal solutions for the problem. End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i.e., even the largest video segmentation dataset only contains 90 short video clips. To solve this problem, we build a new large-scale video object segmentation dataset called YouTube Video Object Segmentation dataset (YouTube-VOS). Our dataset contains 3,252 YouTube video clips and 78 categories including common objects and human activities. This is by far the largest video object segmentation dataset to our knowledge and we have released it at https://youtube-vos.org. Based on this dataset, we propose a novel sequence-to-sequence network to fully exploit long-term spatial-temporal information in videos for segmentation. We demonstrate that our method is able to achieve the best results on our YouTube-VOS test set and comparable results on DAVIS 2016 compared to the current state-of-the-art methods. Experiments show that the large scale dataset is indeed a key factor to the success of our model. △ Less

Submitted 3 September, 2018; originally announced September 2018.

Comments: ECCV 2018 accepted paper

arXiv:1808.10617 [pdf, other]

doi 10.1109/OCEANS.2018.8604642

Gradual Collective Upgrade of a Swarm of Autonomous Buoys for Dynamic Ocean Monitoring

Authors: Francesco Vallegra, David Mateo, Grgur Tokić, Roland Bouffanais, Dick K. P. Yue

Abstract: Swarms of autonomous surface vehicles equipped with environmental sensors and decentralized communications bring a new wave of attractive possibilities for the monitoring of dynamic features in oceans and other waterbodies. However, a key challenge in swarm robotics design is the efficient collective operation of heterogeneous systems. We present both theoretical analysis and field experiments on… ▽ More Swarms of autonomous surface vehicles equipped with environmental sensors and decentralized communications bring a new wave of attractive possibilities for the monitoring of dynamic features in oceans and other waterbodies. However, a key challenge in swarm robotics design is the efficient collective operation of heterogeneous systems. We present both theoretical analysis and field experiments on the responsiveness in dynamic area coverage of a collective of 22 autonomous buoys, where 4 units are upgraded to a new design that allows them to move 80\% faster than the rest. This system is able to react on timescales of the minute to changes in areas on the order of a few thousand square meters. We have observed that this partial upgrade of the system significantly increases its average responsiveness, without necessarily improving the spatial uniformity of the deployment. These experiments show that the autonomous buoy designs and the cooperative control rule described in this work provide an efficient, flexible, and scalable solution for the pervasive and persistent monitoring of water environments. △ Less

Submitted 31 August, 2018; originally announced August 2018.

Comments: Proceedings of the OCEANS 2018 conference

Journal ref: OCEANS 2018 MTS/IEEE Charleston, Charleston, S.C., 2018, p. 1-7

arXiv:1801.02987 [pdf, ps, other]

Multiplexing Analysis of Millimeter-Wave Massive MIMO Systems

Authors: Dian-Wu Yue, Ha H. Nguyen, Shuai Xu

Abstract: This paper is concerned with spatial multiplexing analysis for millimeter-wave (mmWave) massive MIMO systems. For a single-user mmWave system employing distributed antenna subarray architecture in which the transmitter and receiver consist of Kt and Kr subarrays, respectively, an asymptotic multiplexing gain formula is firstly derived when the numbers of antennas at subarrays go to infinity. Speci… ▽ More This paper is concerned with spatial multiplexing analysis for millimeter-wave (mmWave) massive MIMO systems. For a single-user mmWave system employing distributed antenna subarray architecture in which the transmitter and receiver consist of Kt and Kr subarrays, respectively, an asymptotic multiplexing gain formula is firstly derived when the numbers of antennas at subarrays go to infinity. Specifically, assuming that all subchannels have the same average number of propagation paths L, the formula implies that by employing such a distributed antenna-subarray architecture, an exact average maximum multiplexing gain of KrKtL can be achieved. This result means that compared to the co-located antenna architecture, using the distributed antenna-subarray architecture can scale up the maximum multiplexing gain proportionally to KrKt. In order to further reveal the relation between diversity gain and multiplexing gain, a simple characterization of the diversity-multiplexing tradeoff is also given. The multiplexing gain analysis is then extended to the multiuser scenario. Moreover, simulation results obtained with the hybrid analog/digital processing corroborate the analysis results. △ Less

Submitted 29 July, 2018; v1 submitted 7 January, 2018; originally announced January 2018.

Comments: 10 pages, 8 figures. arXiv admin note: substantial text overlap with arXiv:1801.00387

arXiv:1801.00387 [pdf, ps, other]

Diversity Analysis of Millimeter-Wave Massive MIMO Systems

Authors: Dian-Wu Yue, Shuai Xu, Ha H. Nguyen

Abstract: This paper is concerned with asymptotic diversity analysis for millimeter-wave (mmWave) massive MIMO systems. First, for a single-user mmWave system employing distributed antenna subarray architecture in which the transmitter and receiver consist of Kt and Kr subarrays, respectively, a diversity gain theorem is established when the numbers of antennas at subarrays go to infinity. Specifically, ass… ▽ More This paper is concerned with asymptotic diversity analysis for millimeter-wave (mmWave) massive MIMO systems. First, for a single-user mmWave system employing distributed antenna subarray architecture in which the transmitter and receiver consist of Kt and Kr subarrays, respectively, a diversity gain theorem is established when the numbers of antennas at subarrays go to infinity. Specifically, assuming that all subchannels have the same number of propagation paths L, the theorem states that by employing such a distributed antenna-subarray architecture, a diversity gain of KrKtL-Ns+1 can be achieved, where Ns is the number of data streams. This result means that compared to the co-located antenna architecture, using the distributed antenna-subarray architecture can scale up the diversity gain or multiplexing gain proportionally to KrKt. The diversity gain analysis is then extended to the multiuser scenario as well as the scenario with conventional partially-connected RF structure in the literature. Simulation results obtained with the hybrid analog/digital processing corroborate the analysis results and show that the distributed subarray architecture indeed yields significantly better diversity performance than the co-located antenna architectures. △ Less

Submitted 31 December, 2017; originally announced January 2018.

Comments: 10 pages, 10 figures

arXiv:1705.04010 [pdf, other]

doi 10.3389/frobt.2017.00012

Swarm-Enabling Technology for Multi-Robot Systems

Authors: Mohammadreza Chamanbaz, David Mateo, Brandon M. Zoss, Grgur Tokić, Erik Wilhelm, Roland Bouffanais, and Dick K. P. Yue

Abstract: Swarm robotics has experienced a rapid expansion in recent years, primarily fueled by specialized multi-robot systems developed to achieve dedicated collective actions. These specialized platforms are in general designed with swarming considerations at the front and center. Key hardware and software elements required for swarming are often deeply embedded and integrated with the particular system.… ▽ More Swarm robotics has experienced a rapid expansion in recent years, primarily fueled by specialized multi-robot systems developed to achieve dedicated collective actions. These specialized platforms are in general designed with swarming considerations at the front and center. Key hardware and software elements required for swarming are often deeply embedded and integrated with the particular system. However, given the noticeable increase in the number of low-cost mobile robots readily available, practitioners and hobbyists may start considering to assemble full-fledged swarms by minimally retrofitting such mobile platforms with a swarm-enabling technology. Here, we report one possible embodiment of such a technology designed to enable the assembly and the study of swarming in a range of general-purpose robotic systems. This is achieved by combining a modular and transferable software toolbox with a hardware suite composed of a collection of low-cost and off-the-shelf components. The developed technology can be ported to a relatively vast range of robotic platforms with minimal changes and high levels of scalability. This swarm-enabling technology has successfully been implemented on two distinct distributed multi-robot systems, a swarm of mobile marine buoys and a team of commercial terrestrial robots. We have tested the effectiveness of both of these distributed robotic systems in performing collective exploration and search scenarios, as well as other classical cooperative behaviors. Experimental results on different swarm behaviors are reported for the two platforms in uncontrolled environments and without any supporting infrastructure. The design of the associated software library allows for a seamless switch to other cooperative behaviors, and also offers the possibility to simulate newly designed collective behaviors prior to their implementation onto the platforms. △ Less

Submitted 11 May, 2017; originally announced May 2017.

Journal ref: Frontiers in Robotics and AI 4 (2017) 12

arXiv:1702.05729 [pdf, other]

Person Search with Natural Language Description

Authors: Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang

Abstract: Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textua… ▽ More Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual description of a person, the algorithm of the person search is required to rank all the samples in the person database then retrieve the most relevant sample corresponding to the queried description. Since there is no person dataset or benchmark with textual description available, we collect a large-scale person description dataset with detailed natural language annotations and person samples from various sources, termed as CUHK Person Description Dataset (CUHK-PEDES). A wide range of possible models and baselines have been evaluated and compared on the person search benchmark. An Recurrent Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to establish the state-of-the art performance on person search. △ Less

Submitted 30 March, 2017; v1 submitted 19 February, 2017; originally announced February 2017.

arXiv:1404.1654 [pdf, ps, other]

LOS-based Conjugate Beamforming and Power-Scaling Law in Massive-MIMO Systems

Authors: Dian-Wu Yue, Geoffrey Ye Li

Abstract: This paper is concerned with massive-MIMO systems over Rician flat fading channels. In order to reduce the overhead to obtain full channel state information and to avoid the pilot contamination problem, by treating the scattered component as interference, we investigate a transmit and receive conjugate beamforming (BF) transmission scheme only based on the line-of-sight (LOS) component. Under Rank… ▽ More This paper is concerned with massive-MIMO systems over Rician flat fading channels. In order to reduce the overhead to obtain full channel state information and to avoid the pilot contamination problem, by treating the scattered component as interference, we investigate a transmit and receive conjugate beamforming (BF) transmission scheme only based on the line-of-sight (LOS) component. Under Rank-1 model, we first consider a single-user system with N transmit and M receive antennas, and focus on the problem of power-scaling law when the transmit power is scaled down proportionally to 1/MN. It can be shown that as MN grows large, the scattered interference vanishes, and the ergodic achievable rate is higher than that of the corresponding BF scheme based fast fading and minimum mean-square error (MMSE) channel estimation. Then we further consider uplink and downlink single-cell scenarios where the base station (BS) has M antennas and each of K users has N antennas. When the transmit power for each user is scaled down proportionally to 1/MN, it can be shown for finite users that as M grows without bound, each user obtains finally the same rate performance as in the single-user case. Even when N grows without bound, however, there still remains inter-user LOS interference that can not be cancelled. Regarding infinite users, there exists such a power scaling law that when K and the b-th power of M go to infinity with a fixed and finite ratio for a given b in (0, 1), not only inter-user LOS interference but also fast fading effect can be cancelled, while fast fading effect can not be cancelled if b=1. Extension to multi-cells and frequency-selective channels are also discussed shortly. Moreover, numerical results indicate that spacial antenna correlation does not have serious influence on the rate performance, and the BS antennas may be allowed to be placed compactly when M is very large. △ Less

Submitted 8 December, 2014; v1 submitted 7 April, 2014; originally announced April 2014.

Comments: 32 pages, 11 figures

arXiv:1403.6561 [pdf, ps, other]

Transmit Power Minimization for MIMO Systems of Exponential Average BER with Fixed Outage Probability

Authors: Dian-Wu Yue, Yichuang Sun

Abstract: This paper is concerned with a wireless system operating in MIMO fading channels with channel state information being known at both transmitter and receiver. By spatiotemporal subchannel selection and power control, it aims to minimize the average transmit power (ATP) of the MIMO system while achieving an exponential type of average bit error rate (BER) for each data stream. Under the constraints… ▽ More This paper is concerned with a wireless system operating in MIMO fading channels with channel state information being known at both transmitter and receiver. By spatiotemporal subchannel selection and power control, it aims to minimize the average transmit power (ATP) of the MIMO system while achieving an exponential type of average bit error rate (BER) for each data stream. Under the constraints of a given fixed individual outage probability (OP) and average BER for each subchannel, based on a traditional upper bound and a dynamic upper bound of Q function, two closed-form ATP expressions are derived, respectively, and they correspond to two different power allocation schemes. Numerical results are provided to validate the theoretical analysis, and show that the power allocation scheme with the dynamic upper bound can achieve more power savings than the one with the traditional upper bound. △ Less

Submitted 25 March, 2014; originally announced March 2014.

Comments: 20 pages, 4 figures

Showing 1–21 of 21 results for author: Yue, D