Skip to main content

Showing 1–37 of 37 results for author: Gao, P

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2405.07648  [pdf, other

    cs.CV eess.IV

    CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

    Authors: Qingguo Liu, Chenyi Zhuang, Pan Gao, Jie Qin

    Abstract: Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details. In this paper, we propose a novel BSR approach, Content-aware Degradation-driven Transformer (CDFormer), to capture both degradation and content representations. However, low-resolution images cannot provide enough content details… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2404.13550  [pdf, other

    cs.CV eess.IV

    Pointsoup: High-Performance and Extremely Low-Decoding-Latency Learned Geometry Codec for Large-Scale Point Cloud Scenes

    Authors: Kang You, Kai Liu, Li Yu, Pan Gao, Dandan Ding

    Abstract: Despite considerable progress being achieved in point cloud geometry compression, there still remains a challenge in effectively compressing large-scale scenes with sparse surfaces. Another key challenge lies in reducing decoding latency, a crucial requirement in real-world application. In this paper, we propose Pointsoup, an efficient learning-based geometry codec that attains high-performance an… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  4. arXiv:2404.07989  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SD eess.AS

    Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

    Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

    Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

  5. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  6. arXiv:2312.15653  [pdf, other

    cs.IT eess.SP

    Index Modulation for Fluid Antenna-Assisted MIMO Communications: System Design and Performance Analysis

    Authors: **g Zhu, Gaojie Chen, Pengyu Gao, Pei Xiao, Zihuai Lin, Atta Quddus

    Abstract: In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 12 pages,9 figures, publish to TWC

  7. arXiv:2312.06995  [pdf, other

    cs.CV eess.IV

    Transformer-based No-Reference Image Quality Assessment via Supervised Contrastive Learning

    Authors: **song Shi, Pan Gao, Jie Qin

    Abstract: Image Quality Assessment (IQA) has long been a research hotspot in the field of image processing, especially No-Reference Image Quality Assessment (NR-IQA). Due to the powerful feature extraction ability, existing Convolution Neural Network (CNN) and Transformers based NR-IQA methods have achieved considerable progress. However, they still exhibit limited capability when facing unknown authentic d… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24

  8. arXiv:2312.06462  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

    Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

    Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More

    Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight. 13 pages, 10 figures

  9. arXiv:2311.16572   

    eess.SY physics.ao-ph physics.soc-ph

    Adapting to climate change: Long-term impact of wind resource changes on China's power system resilience

    Authors: Jiaqi Ruan, Xiangrui Meng, Yifan Zhu, Gaoqi Liang, Xianzhuo Sun, Huayi Wu, Huijuan Xiao, Mengqian Lu, Pin Gao, Jiapeng Li, Wai-Kin Wong, Zhao Xu, Junhua Zhao

    Abstract: Modern society's reliance on power systems is at risk from the escalating effects of wind-related climate change. Yet, failure to identify the intricate relationship between wind-related climate risks and power systems could lead to serious short- and long-term issues, including partial or complete blackouts. Here, we develop a comprehensive framework to assess China's power system resilience acro… ▽ More

    Submitted 24 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Not suitable for publication

  10. arXiv:2309.03905  [pdf, other

    cs.MM cs.CL cs.CV cs.LG cs.SD eess.AS

    ImageBind-LLM: Multi-modality Instruction Tuning

    Authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

    Abstract: We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training… ▽ More

    Submitted 11 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Code is available at https://github.com/OpenGVLab/LLaMA-Adapter

  11. arXiv:2306.15942  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

    Authors: Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

    Abstract: Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  12. arXiv:2305.09353  [pdf, other

    cs.CV eess.IV

    Blind Image Quality Assessment via Transformer Predicted Error Map and Perceptual Quality Token

    Authors: **song Shi, Pan Gao, Aljosa Smolic

    Abstract: Image quality assessment is a fundamental problem in the field of image processing, and due to the lack of reference images in most practical scenarios, no-reference image quality assessment (NR-IQA), has gained increasing attention recently. With the development of deep learning technology, many deep neural network-based NR-IQA methods have been developed, which try to learn the image quality bas… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: Submitted to TMM

  13. arXiv:2303.10912  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Exploring Representation Learning for Small-Footprint Keyword Spotting

    Authors: Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang

    Abstract: In this paper, we investigate representation learning for low-resource keyword spotting (KWS). The main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model. First, local-global contrastive siamese networks (LGCSiam) a… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  14. arXiv:2303.08525  [pdf, other

    cs.CV eess.IV

    MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360 Degree Image Saliency Prediction

    Authors: Pan Gao, Xinlang Chen, Rong Quan, Wei Xiang

    Abstract: Thanks to the ability of providing an immersive and interactive experience, the uptake of 360 degree image content has been rapidly growing in consumer and industrial applications. Compared to planar 2D images, saliency prediction for 360 degree images is more challenging due to their high resolutions and spherical viewing ranges. Currently, most high-performance saliency prediction models for omn… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

  15. arXiv:2211.10646  [pdf, other

    cs.MM cs.IT eess.IV

    Rate-Distortion Modeling for Bit Rate Constrained Point Cloud Compression

    Authors: Pan Gao, Shengzhou Luo, Manoranjan Paul

    Abstract: As being one of the main representation formats of 3D real world and well-suited for virtual reality and augmented reality applications, point clouds have gained a lot of popularity. In order to reduce the huge amount of data, a considerable amount of research on point cloud compression has been done. However, given a target bit rate, how to properly choose the color and geometry quantization para… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

    Comments: Accepted to IEEE Transactions on Circuits and Systems for Video Technology

  16. arXiv:2208.02519  [pdf

    cs.CV cs.IT cs.MM eess.IV

    IPDAE: Improved Patch-Based Deep Autoencoder for Lossy Point Cloud Geometry Compression

    Authors: Kang You, Pan Gao, Qing Li

    Abstract: Point cloud is a crucial representation of 3D contents, which has been widely used in many areas such as virtual reality, mixed reality, autonomous driving, etc. With the boost of the number of points in the data, how to efficiently compress point cloud becomes a challenging problem. In this paper, we propose a set of significant improvements to patch-based point cloud compression, i.e., a learnab… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: 12 pages

  17. arXiv:2205.11607  [pdf, ps, other

    cs.IT eess.SP

    Low-Complexity Block Coordinate Descend Based Multiuser Detection for Uplink Grant-Free NOMA

    Authors: Pengyu Gao, Zilong Liu, Pei Xiao, Chuan Heng Foh, **g Zhang

    Abstract: Grant-free non-orthogonal multiple access (NOMA) scheme is considered as a promising candidate for the enabling of massive connectivity and reduced signalling overhead for Internet of Things (IoT) applications in massive machine-type communication (mMTC) networks. Exploiting the inherent nature of sporadic transmissions in the grant-free NOMA systems, compressed sensing based multiuser detection (… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  18. arXiv:2203.16772  [pdf, other

    cs.SD cs.AI eess.AS

    Learning Decoupling Features Through Orthogonality Regularization

    Authors: Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou

    Abstract: Keyword spotting (KWS) and speaker verification (SV) are two important tasks in speech applications. Research shows that the state-of-art KWS and SV models are trained independently using different datasets since they expect to learn distinctive acoustic features. However, humans can distinguish language content and the speaker identity simultaneously. Motivated by this, we believe it is important… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted at ICASSP 2022

  19. arXiv:2203.13310  [pdf, other

    cs.CV cs.AI eess.IV

    MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

    Authors: Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li

    Abstract: Monocular 3D object detection has long been a challenging task in autonomous driving. Most existing methods follow conventional 2D detectors to first localize object centers, and then predict 3D attributes by neighboring features. However, only using local visual features is insufficient to understand the scene-level 3D spatial structures and ignores the long-range inter-object depth relations. In… ▽ More

    Submitted 24 August, 2023; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted by ICCV 2023. Code is available at https://github.com/ZrrSkywalker/MonoDETR

  20. arXiv:2203.10281  [pdf, ps, other

    eess.SP

    Min-Max Latency Optimization Based on Sensed Position State Information in Internet of Vehicles

    Authors: Pengzun Gao, Long Zhao, Kan Zheng, **zhi Fan

    Abstract: The dual-function radar communication (DFRC) is an essential technology in Internet of Vehicles (IoV). Consider that the road-side unit (RSU) employs the DFRC signals to sense the vehicles' position state information (PSI), and communicates with the vehicles based on PSI. The objective of this paper is to minimize the maximum communication delay among all vehicles by considering the estimation acc… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

  21. arXiv:2202.05514  [pdf, other

    eess.IV cs.CV cs.MM

    Dilated convolutional neural network-based deep reference picture generation for video compression

    Authors: Haoyue Tian, Pan Gao, Ran Wei, Manoranjan Paul

    Abstract: Motion estimation and motion compensation are indispensable parts of inter prediction in video coding. Since the motion vector of objects is mostly in fractional pixel units, original reference pictures may not accurately provide a suitable reference for motion compensation. In this paper, we propose a deep reference picture generator which can create a picture that is more relevant to the current… ▽ More

    Submitted 11 February, 2022; originally announced February 2022.

  22. arXiv:2201.02314  [pdf, other

    eess.IV cs.CV

    RestoreDet: Degradation Equivariant Representation for Object Detection in Low Resolution Images

    Authors: Ziteng Cui, Yingying Zhu, Lin Gu, Guo-Jun Qi, Xiaoxiao Li, Peng Gao, Zenghui Zhang, Tatsuya Harada

    Abstract: Image restoration algorithms such as super resolution (SR) are indispensable pre-processing modules for object detection in degraded images. However, most of these algorithms assume the degradation is fixed and known a priori. When the real degradation is unknown or differs from assumption, both the pre-processing module and the consequent high-level task such as object detection would fail. Here,… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

    Comments: 11 pages, 3figures

  23. arXiv:2112.04744  [pdf, other

    cs.CV eess.IV

    Superpixel-Based Building Damage Detection from Post-earthquake Very High Resolution Imagery Using Deep Neural Networks

    Authors: Jun Wang, Zhou**g Li, Yixuan Qiao, Qiming Qin, Peng Gao, Guotong Xie

    Abstract: Building damage detection after natural disasters like earthquakes is crucial for initiating effective emergency response actions. Remotely sensed very high spatial resolution (VHR) imagery can provide vital information due to their ability to map the affected buildings with high geometric precision. Many approaches have been developed to detect damaged buildings due to earthquakes. However, littl… ▽ More

    Submitted 30 September, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

  24. arXiv:2110.09109  [pdf, other

    cs.CV cs.MM eess.IV

    Patch-Based Deep Autoencoder for Point Cloud Geometry Compression

    Authors: Kang You, Pan Gao

    Abstract: The ever-increasing 3D application makes the point cloud compression unprecedentedly important and needed. In this paper, we propose a patch-based compression process using deep learning, focusing on the lossy point cloud geometry compression. Unlike existing point cloud compression networks, which apply feature extraction and reconstruction on the entire point cloud, we divide the point cloud int… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted to ACM Multimedia Asia (MMAsia '21)

  25. arXiv:2107.11222  [pdf

    cs.SD eess.AS eess.SP

    Multi-channel Speech Enhancement with 2-D Convolutional Time-frequency Domain Features and a Pre-trained Acoustic Model

    Authors: Quandong Wang, Junnan Wu, Zhao Yan, Sichong Qian, Liyong Guo, Lichun Fan, Weiji Zhuang, Peng Gao, Yujun Wang

    Abstract: We propose a multi-channel speech enhancement approach with a novel two-stage feature fusion method and a pre-trained acoustic model in a multi-task learning paradigm. In the first fusion stage, the time-domain and frequency-domain features are extracted separately. In the time domain, the multi-channel convolution sum (MCS) and the inter-channel convolution differences (ICDs) features are compute… ▽ More

    Submitted 24 September, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: 7 pages, 3 figures, accepted to APSIPA 2021, revised

  26. arXiv:2106.13394  [pdf, other

    cs.CV eess.IV

    Countering Adversarial Examples: Combining Input Transformation and Noisy Training

    Authors: Cheng Zhang, Pan Gao

    Abstract: Recent studies have shown that neural network (NN) based image classifiers are highly vulnerable to adversarial examples, which poses a threat to security-sensitive image recognition task. Prior work has shown that JPEG compression can combat the drop in classification accuracy on adversarial examples to some extent. But, as the compression ratio increases, traditional JPEG compression is insuffic… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  27. arXiv:2011.09081  [pdf, other

    cs.SD eess.AS

    Multi-Channel Automatic Speech Recognition Using Deep Complex Unet

    Authors: Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie

    Abstract: The front-end module in multi-channel automatic speech recognition (ASR) systems mainly use microphone array techniques to produce enhanced signals in noisy conditions with reverberation and echos. Recently, neural network (NN) based front-end has shown promising improvement over the conventional signal processing methods. In this paper, we propose to adopt the architecture of deep complex Unet (D… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: 7 pages, 4 figures, IEEE SLT 2021 Technical Committee

  28. arXiv:2007.13135  [pdf, other

    cs.CV eess.IV

    Contrastive Visual-Linguistic Pretraining

    Authors: Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su

    Abstract: Several multi-modality representation learning approaches such as LXMERT and ViLBERT have been proposed recently. Such approaches can achieve superior performance due to the high-level semantic information captured during large-scale multimodal pretraining. However, as ViLBERT and LXMERT adopt visual region regression and classification loss, they often suffer from domain gap and noisy label probl… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

  29. arXiv:2005.08646  [pdf, other

    cs.CV eess.IV

    Character Matters: Video Story Understanding with Character-Aware Relations

    Authors: Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo

    Abstract: Different from short videos and GIFs, video stories contain clear plots and lists of principal characters. Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots. Video Story Question Answering (VSQA) offers an effective way to benchmark higher-level comprehension abilities of a model. However, current VSQ… ▽ More

    Submitted 9 May, 2020; originally announced May 2020.

  30. arXiv:2005.08001  [pdf, other

    eess.IV cs.CV

    Extreme Low-Light Imaging with Multi-granulation Cooperative Networks

    Authors: Keqi Wang, Peng Gao, Steven Hoi, Qian Guo, Yuhua Qian

    Abstract: Low-light imaging is challenging since images may appear to be dark and noised due to low signal-to-noise ratio, complex image content, and the variety in shooting scenes in extreme low-light condition. Many methods have been proposed to enhance the imaging quality under extreme low-light conditions, but it remains difficult to obtain satisfactory results, especially when they attempt to retain hi… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

  31. arXiv:2001.05840  [pdf, other

    cs.CV eess.IV

    Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering

    Authors: Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su

    Abstract: Multi-modality fusion technologies have greatly improved the performance of neural network-based Video Description/Caption, Visual Question Answering (VQA) and Audio Visual Scene-aware Dialog (AVSD) over the recent years. Most previous approaches only explore the last layers of multiple layer feature fusion while omitting the importance of intermediate layers. To solve the issue for the intermedia… ▽ More

    Submitted 16 February, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

  32. arXiv:1908.10009  [pdf, other

    cs.CV cs.LG eess.IV

    Learning Reinforced Attentional Representation for End-to-End Visual Tracking

    Authors: Peng Gao, Qiquan Zhang, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

    Abstract: Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge. In this paper, we propose an end-to-end network model to learn reinforced attentional representation for accurate target object discrimination and localization. We utilize a novel hierarchical attentional module with long short-term memory a… ▽ More

    Submitted 1 January, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

    Comments: Accepted by Information Sciences

  33. arXiv:1908.04289  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    Multi-modality Latent Interaction Network for Visual Question Answering

    Authors: Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

    Abstract: Exploiting relationships between visual regions and question words have achieved great success in learning multi-modality features for Visual Question Answering (VQA). However, we argue that existing methods mostly model relations between individual visual regions and words, which are not enough to correctly answer the question. From humans' perspective, answering a visual question requires unders… ▽ More

    Submitted 10 August, 2019; originally announced August 2019.

  34. arXiv:1906.10886  [pdf, other

    cs.CV cs.GR eess.IV

    Joint Multi-frame Detection and Segmentation for Multi-cell Tracking

    Authors: Zibin Zhou, Fei Wang, Wenjuan Xi, Huaying Chen, Peng Gao, Chengkang He

    Abstract: Tracking living cells in video sequence is difficult, because of cell morphology and high similarities between cells. Tracking-by-detection methods are widely used in multi-cell tracking. We perform multi-cell tracking based on the cell centroid detection, and the performance of the detector has high impact on tracking performance. In this paper, UNet is utilized to extract inter-frame and intra-f… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

    Comments: Accepted by International Conference on Image and Graphics (ICIG 2019)

  35. arXiv:1904.01509  [pdf, other

    cs.LG cs.CV cs.GR eess.IV stat.ML

    FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

    Authors: Yanfu Yan, Ke Lu, Jian Xue, Pengcheng Gao, Jiayi Lyu

    Abstract: Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presenc… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: 9 pages, 7 figures

    Journal ref: 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

  36. arXiv:1810.05367  [pdf

    cs.CV eess.IV

    FPGA-based Acceleration System for Visual Tracking

    Authors: Ke Song, Chun Yuan, Peng Gao, Yunxu Sun

    Abstract: Visual tracking is one of the most important application areas of computer vision. At present, most algorithms are mainly implemented on PCs, and it is difficult to ensure real-time performance when applied in the real scenario. In order to improve the tracking speed and reduce the overall power consumption of visual tracking, this paper proposes a real-time visual tracking algorithm based on DSST… ▽ More

    Submitted 14 October, 2018; v1 submitted 12 October, 2018; originally announced October 2018.

    Comments: Accepted by IEEE 14th International Conference on Solid-State and Integrated Circuit Technology (ICSICT)

  37. Identification of Successive "Unobservable" Cyber Data Attacks in Power Systems Through Matrix Decomposition

    Authors: Pengzhi Gao, Meng Wang, Joe H. Chow, Scott G. Ghiocel, Bruce Fardanesh, George Stefopoulos, Michael P. Razanousky

    Abstract: This paper presents a new framework of identifying a series of cyber data attacks on power system synchrophasor measurements. We focus on detecting "unobservable" cyber data attacks that cannot be detected by any existing method that purely relies on measurements received at one time instant. Leveraging the approximate low-rank property of phasor measurement unit (PMU) data, we formulate the ident… ▽ More

    Submitted 16 July, 2016; originally announced July 2016.

    Comments: 13 pages, accepted to IEEE Trans. Signal Processing