Search | arXiv e-print repository

A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation

Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wen**g Xu, Huixiang Zhi

Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accuratel… ▽ More Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2310.01742 [pdf, other]

Intelligent Reflecting Surface Aided MIMO Networks: Distributed or Centralized Architecture?

Authors: Guangji Chen, Qingqing Wu, Wen Chen, Yanzhao Hou, Mengnan Jian, Shunqing Zhang, Jun Li, Lajos Hanzo

Abstract: We investigate the capacity of a broadcast channel with a multi-antenna base station (BS) sending independent messages to multiple users, aided by IRSs with N elements. In particular, both the distributed and centralized IRS deployment architectures are considered. Regarding the distributed IRS, the N IRS elements form multiple IRSs and each of them is installed near a user cluster; while for the… ▽ More We investigate the capacity of a broadcast channel with a multi-antenna base station (BS) sending independent messages to multiple users, aided by IRSs with N elements. In particular, both the distributed and centralized IRS deployment architectures are considered. Regarding the distributed IRS, the N IRS elements form multiple IRSs and each of them is installed near a user cluster; while for the centralized IRS, all IRS elements are located in the vicinity of the BS. To draw essential insights, we first derive the maximum capacity achieved by the distributed IRS and centralized IRS, respectively, under the assumption of line-of-sight propagation and homogeneous channel setups. By capturing the fundamental tradeoff between the spatial multiplexing gain and passive beamforming gain, we rigourously prove that the capacity of the distributed IRS is higher than that of the centralized IRS provided that the total number of IRS elements is above a threshold. Motivated by the superiority of the distributed IRS, we then focus on the transmission and element allocation design under the distributed IRS. By exploiting the user channel correlation of intra-clusters and inter-clusters, an efficient hybrid multiple access scheme relying on both spatial and time domains is proposed to fully exploit both the passive beamforming gain and spatial DoF. Moreover, the IRS element allocation problem is investigated for the objectives of sum-rate maximization and minimum user rate maximization, respectively. Finally, extensive numerical results are provided to validate our theoretical finding and also to unveil the effectiveness of the distributed IRS for improving the system capacity under various system setups. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2305.13924 [pdf, other]

doi 10.23919/ICN.2023.0021

Integrated Sensing and Communication based Outdoor Multi-Target Detection, Tracking and Localization in Practical 5G Networks

Authors: Ruiqi Liu, Mengnan Jian, Dawei Chen, Xu Lin, Yichao Cheng, Wei Cheng, Shijun Chen

Abstract: The 6th generation (6G) wireless networks will likely to support a variety of capabilities beyond communication, such as sensing and localization, through the use of communication networks empowered by advanced technologies. Integrated sensing and communication (ISAC) has been recognized as a critical technology as well as an usage scenario for 6G, as widely agreed by leading global standardizatio… ▽ More The 6th generation (6G) wireless networks will likely to support a variety of capabilities beyond communication, such as sensing and localization, through the use of communication networks empowered by advanced technologies. Integrated sensing and communication (ISAC) has been recognized as a critical technology as well as an usage scenario for 6G, as widely agreed by leading global standardization bodies. ISAC utilizes communication infrastructure and devices to provide the capability of sensing the environment with high resolution, as well as tracking and localizing moving objects nearby. Meeting both the requirements for communication and sensing simultaneously, ISAC based approaches celebrate the advantages of higher spectral and energy efficiency compared to two separate systems to serve two purposes, and potentially lower costs and easy deployment. A key step towards the standardization and commercialization of ISAC is to carry out comprehensive field trials in practical networks, such as the 5th generation (5G) network, to demonstrate its true capacities in practical scenarios. In this paper, an ISAC based outdoor multi-target detection, tracking and localization approach is proposed and validated in 5G networks. The proposed system comprises of 5G base stations (BSs) which serve nearby mobile users normally, while accomplishing the task of detecting, tracking and localizing drones, vehicles and pedestrians simultaneously. Comprehensive trial results demonstrate the relatively high accuracy of the proposed method in practical outdoor environment when tracking and localizing single targets and multiple targets. △ Less

Submitted 14 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Accepted by an open access journal (appearing on IEEEXplore soon)

Journal ref: Intelligent and Converged Networks ( Volume: 4, Issue: 3, September 2023)

arXiv:2304.11639 [pdf, other]

Static IRS Meets Distributed MIMO: A New Architecture for Dynamic Beamforming

Authors: Guangji Chen, Qingqing Wu, Celimuge Wu, Mengnan Jian, Yijian Chen, Wen Chen

Abstract: Intelligent reflecting surface (IRS) has been considered as a revolutionary technology to enhance the wireless communication performance. To cater for multiple mobile users, adjusting IRS beamforming patterns over time, i.e., dynamic IRS beamforming (DIBF), is generally needed for achieving satisfactory performance, which results in high controlling power consumption and overhead. To avoid such co… ▽ More Intelligent reflecting surface (IRS) has been considered as a revolutionary technology to enhance the wireless communication performance. To cater for multiple mobile users, adjusting IRS beamforming patterns over time, i.e., dynamic IRS beamforming (DIBF), is generally needed for achieving satisfactory performance, which results in high controlling power consumption and overhead. To avoid such cost, we propose a new architecture based on the static regulated IRS for wireless coverage enhancement, where the principle of distributed multiple-input multiple-output (D-MIMO) is integrated into the system to exploite the diversity of spatial directions provided by multiple access points (APs). For this new D-MIMO empowered static IRS architecture, the total target area is partitioned into several subareas and each subarea is served by an assigned AP. We consider to maximize the worst-case received power over all locations in the target area by jointly optimizing a single set of IRS beamforming pattern and AP-subarea association. Then, a two-step algorithm is proposed to obtain its high-quality solution. Theoretical analysis unveils that the fundamental squared power gain can still be achieved over all locations in the target area. The performance gap relative to the DIBF scheme is also analytically quantified. Numerical results validate our theoretical findings and demonstrate the effectiveness of our proposed design over benchmark schemes. △ Less

Submitted 28 April, 2023; v1 submitted 23 April, 2023; originally announced April 2023.

Comments: Submitted to IEEE WCL for possible publication

arXiv:2212.07847 [pdf, ps, other]

Hierarchical Codebook Design for Near-Field MmWave MIMO Communications Systems

Authors: Jiawei Chen, Feifei Gao, Mengnan Jian, Wanmai Yuan

Abstract: Communications system with analog or hybrid analog/digital architectures usually relies on a pre-defined codebook to perform beamforming. With the increase in the size of the antenna array, the characteristics of the spherical wavefront in the near-field situation are not negligible. Therefore, it is necessary to design a codebook that is adaptive to near-field scenarios. In this letter, we invest… ▽ More Communications system with analog or hybrid analog/digital architectures usually relies on a pre-defined codebook to perform beamforming. With the increase in the size of the antenna array, the characteristics of the spherical wavefront in the near-field situation are not negligible. Therefore, it is necessary to design a codebook that is adaptive to near-field scenarios. In this letter, we investigate the hierarchical codebook design method in the near-field situation. We develop a steering beam gain calculation method and design the lower-layer codebook to satisfy the coverage of the Fresnel region. For the upper-layer codebook, we propose beam rotation and beam relocation methods to place an arbitrary beam pattern at target locations. The simulation results show the superiority of the proposed near-field hierarchical codebook design. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: 5 pages, 5 figures, letter

arXiv:2206.13390 [pdf, other]

A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!

Authors: Chenglizhao Chen, Mengke Song, Wenfeng Song, Li Guo, Muwei Jian

Abstract: Video saliency detection (VSD) aims at fast locating the most attractive objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied on the visual system but paid less attention to the audio aspect, while, actually, our audio system is the most vital complementary part to our visual system. Also, audio-visual saliency detection (AVSD), one of the most representativ… ▽ More Video saliency detection (VSD) aims at fast locating the most attractive objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied on the visual system but paid less attention to the audio aspect, while, actually, our audio system is the most vital complementary part to our visual system. Also, audio-visual saliency detection (AVSD), one of the most representative research topics for mimicking human perceptual mechanisms, is currently in its infancy, and none of the existing survey papers have touched on it, especially from the perspective of saliency detection. Thus, the ultimate goal of this paper is to provide an extensive review to bridge the gap between audio-visual fusion and saliency detection. In addition, as another highlight of this review, we have provided a deep insight into key factors which could directly determine the performances of AVSD deep models, and we claim that the audio-visual consistency degree (AVC) -- a long-overlooked issue, can directly influence the effectiveness of using audio to benefit its visual counterpart when performing saliency detection. Moreover, in order to make the AVC issue more practical and valuable for future followers, we have newly equipped almost all existing publicly available AVSD datasets with additional frame-wise AVC labels. Based on these upgraded datasets, we have conducted extensive quantitative evaluations to ground our claim on the importance of AVC in the AVSD task. In a word, both our ideas and new sets serve as a convenient platform with preliminaries and guidelines, all of which are very potential to facilitate future works in promoting state-of-the-art (SOTA) performance further. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2203.03176 [pdf, other]

Reconfigurable Intelligent Surfaces for Wireless Communications: Overview of Hardware Designs, Channel Models, and Estimation Techniques

Authors: Mengnan Jian, George C. Alexandropoulos, Ertugrul Basar, Chongwen Huang, Ruiqi Liu, Yuanwei Liu, Chau Yuen

Abstract: The demanding objectives for the future sixth generation (6G) of wireless communication networks have spurred recent research efforts on novel materials and radio-frequency front-end architectures for wireless connectivity, as well as revolutionary communication and computing paradigms. Among the pioneering candidate technologies for 6G belong the reconfigurable intelligent surfaces (RISs), which… ▽ More The demanding objectives for the future sixth generation (6G) of wireless communication networks have spurred recent research efforts on novel materials and radio-frequency front-end architectures for wireless connectivity, as well as revolutionary communication and computing paradigms. Among the pioneering candidate technologies for 6G belong the reconfigurable intelligent surfaces (RISs), which are artificial planar structures with integrated electronic circuits that can be programmed to manipulate the incoming electromagnetic field in a wide variety of functionalities. Incorporating RISs in wireless networks has been recently advocated as a revolutionary means to transform any wireless signal propagation environment to a dynamically programmable one, intended for various networking objectives, such as coverage extension and capacity boosting, spatiotemporal focusing with benefits in energy efficiency and secrecy, and low electromagnetic field exposure. Motivated by the recent increasing interests in the field of RISs and the consequent pioneering concept of the RIS-enabled smart wireless environments, in this paper, we overview and taxonomize the latest advances in RIS hardware architectures as well as the most recent developments in the modeling of RIS unit elements and RIS-empowered wireless signal propagation. We also present a thorough overview of the channel estimation approaches for RIS-empowered communications systems, which constitute a prerequisite step for the optimized incorporation of RISs in future wireless networks. Finally, we discuss the relevance of the RIS technology in the latest wireless communication standards, and highlight the current and future standardization activities for the RIS technology and the consequent RIS-empowered wireless networking approaches. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 19 pages, 7 figures, to appear in an ITU journal

arXiv:2201.10963 [pdf, other]

Learning to Compose Diversified Prompts for Image Emotion Classification

Authors: Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang

Abstract: Contrastive Language-Image Pre-training (CLIP) represents the latest incarnation of pre-trained vision-language models. Although CLIP has recently shown its superior power on a wide range of downstream vision-language tasks like Visual Question Answering, it is still underexplored for Image Emotion Classification (IEC). Adapting CLIP to the IEC task has three significant challenges, tremendous tra… ▽ More Contrastive Language-Image Pre-training (CLIP) represents the latest incarnation of pre-trained vision-language models. Although CLIP has recently shown its superior power on a wide range of downstream vision-language tasks like Visual Question Answering, it is still underexplored for Image Emotion Classification (IEC). Adapting CLIP to the IEC task has three significant challenges, tremendous training objective gap between pretraining and IEC, shared suboptimal and invariant prompts for all instances. In this paper, we propose a general framework that shows how CLIP can be effectively applied to IEC. We first introduce a prompt tuning method that mimics the pretraining objective of CLIP and thus can leverage the rich image and text semantics entailed in CLIP. Then we automatically compose instance-specific prompts by conditioning them on the categories and image contents of instances, diversifying prompts and avoiding suboptimal problems. Evaluations on six widely-used affective datasets demonstrate that our proposed method outperforms the state-of-the-art methods to a large margin (i.e., up to 9.29% accuracy gain on EmotionROI dataset) on IEC tasks, with only a few parameters trained. Our codes will be publicly available for research purposes. △ Less

Submitted 30 May, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: 10 pages, 5 figures

arXiv:2108.13164 [pdf]

doi 10.3969/j.issn.1003-3114.2021.06.002

Applications and challenges of Reconfigurable Intelligent Surface for 6G networks

Authors: Yajun Zhao, Mengnan Jian

Abstract: Reconfigurable intelligent surface has attracted the attention of academia and industry as soon as it appears because it can flexibly manipulate the electromagnetic characteristics of wireless channel. Especially in the past one or two years, RIS has been develo** rapidly in academic research and industry promotion and is one of the key candidate technologies for 5G-Advanced and 6G networks. RIS… ▽ More Reconfigurable intelligent surface has attracted the attention of academia and industry as soon as it appears because it can flexibly manipulate the electromagnetic characteristics of wireless channel. Especially in the past one or two years, RIS has been develo** rapidly in academic research and industry promotion and is one of the key candidate technologies for 5G-Advanced and 6G networks. RIS can build a smart radio environment through its ability to regulate radio wave transmission in a flexible way. The introduction of RIS may create a new network paradigm, which brings new possibilities to the future network, but also leads to many new challenges in the technological and engineering applications. This paper first introduces the main aspects of RIS enabled wireless communication network from a new perspective, and then focuses on the key challenges faced by the introduction of RIS. This paper briefly summarizes the main engineering application challenges faced by RIS networks, and further analyzes and discusses several key technical challenges among of them in depth, such as channel degradation, network coexistence, network coexistence and network deployment, and proposes possible solutions. △ Less

Submitted 17 January, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 23 p. Original published in Chinese

Journal ref: Radio Communications Technology, 1-16 [2021-08-10]. http://kns.cnki.net/kcms/detail/13.1099.TN.20210805.1107.002.html

arXiv:2107.07192 [pdf, other]

doi 10.1109/TIM.2021.3096282

Incorporating Lambertian Priors into Surface Normals Measurement

Authors: Yakun Ju, Muwei Jian, Shaoxiang Guo, Yingyu Wang, Huiyu Zhou, Junyu Dong

Abstract: The goal of photometric stereo is to measure the precise surface normal of a 3D object from observations with various shading cues. However, non-Lambertian surfaces influence the measurement accuracy due to irregular shading cues. Despite deep neural networks have been employed to simulate the performance of non-Lambertian surfaces, the error in specularities, shadows, and crinkle regions is hard… ▽ More The goal of photometric stereo is to measure the precise surface normal of a 3D object from observations with various shading cues. However, non-Lambertian surfaces influence the measurement accuracy due to irregular shading cues. Despite deep neural networks have been employed to simulate the performance of non-Lambertian surfaces, the error in specularities, shadows, and crinkle regions is hard to be reduced. In order to address this challenge, we here propose a photometric stereo network that incorporates Lambertian priors to better measure the surface normal. In this paper, we use the initial normal under the Lambertian assumption as the prior information to refine the normal measurement, instead of solely applying the observed shading cues to deriving the surface normal. Our method utilizes the Lambertian information to reparameterize the network weights and the powerful fitting ability of deep neural networks to correct these errors caused by general reflectance properties. Our explorations include: the Lambertian priors (1) reduce the learning hypothesis space, making our method learn the map** in the same surface normal space and improving the accuracy of learning, and (2) provides the differential features learning, improving the surfaces reconstruction of details. Extensive experiments verify the effectiveness of the proposed Lambertian prior photometric stereo network in accurate surface normal measurement, on the challenging benchmark dataset. △ Less

Submitted 15 July, 2021; originally announced July 2021.

arXiv:2007.06288 [pdf, ps, other]

Fusing Motion Patterns and Key Visual Information for Semantic Event Recognition in Basketball Videos

Authors: Lifang Wu, Zhou Yang, Qi Wang, Meng Jian, Boxuan Zhao, Junchi Yan, Chang Wen Chen

Abstract: Many semantic events in team sport activities e.g. basketball often involve both group activities and the outcome (score or not). Motion patterns can be an effective means to identify different activities. Global and local motions have their respective emphasis on different activities, which are difficult to capture from the optical flow due to the mixture of global and local motions. Hence it cal… ▽ More Many semantic events in team sport activities e.g. basketball often involve both group activities and the outcome (score or not). Motion patterns can be an effective means to identify different activities. Global and local motions have their respective emphasis on different activities, which are difficult to capture from the optical flow due to the mixture of global and local motions. Hence it calls for a more effective way to separate the global and local motions. When it comes to the specific case for basketball game analysis, the successful score for each round can be reliably detected by the appearance variation around the basket. Based on the observations, we propose a scheme to fuse global and local motion patterns (MPs) and key visual information (KVI) for semantic event recognition in basketball videos. Firstly, an algorithm is proposed to estimate the global motions from the mixed motions based on the intrinsic property of camera adjustments. And the local motions could be obtained from the mixed and global motions. Secondly, a two-stream 3D CNN framework is utilized for group activity recognition over the separated global and local motion patterns. Thirdly, the basket is detected and its appearance features are extracted through a CNN structure. The features are utilized to predict the success or failure. Finally, the group activity recognition and success/failure prediction results are integrated using the kronecker product for event recognition. Experiments on NCAA dataset demonstrate that the proposed method obtains state-of-the-art performance. △ Less

Submitted 13 July, 2020; originally announced July 2020.

arXiv:1908.03360 [pdf, ps, other]

Deep Learning based Downlink Channel Prediction for FDD Massive MIMO System

Authors: Yuwen Yang, Feifei Gao, Geoffrey Ye Li, Mengnan Jian

Abstract: In a frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) system, the acquisition of downlink channel state information (CSI) at base station (BS) is a very challenging task due to the overwhelming overheads required for downlink training and uplink feedback. In this paper, we reveal a deterministic uplink-to-downlink map** function when the position-to-channel mappin… ▽ More In a frequency division duplexing (FDD) massive multiple-input multiple-output (MIMO) system, the acquisition of downlink channel state information (CSI) at base station (BS) is a very challenging task due to the overwhelming overheads required for downlink training and uplink feedback. In this paper, we reveal a deterministic uplink-to-downlink map** function when the position-to-channel map** is bijective. Motivated by the universal approximation theorem, we then propose a sparse complex-valued neural network (SCNet) to approximate the uplink-to-downlink map** function. Different from general deep networks that operate in the real domain, the SCNet is constructed in the complex domain and is able to learn the complex-valued map** function by off-line training. After training, the SCNet is used to directly predict the downlink CSI based on the estimated uplink CSI without the need of either downlink training or uplink feedback. Numerical results show that the SCNet achieves better performance than general deep networks in terms of prediction accuracy and exhibits remarkable robustness over complicated wireless channels, demonstrating its great potential for practical deployments. △ Less

Submitted 25 August, 2019; v1 submitted 9 August, 2019; originally announced August 2019.

arXiv:1904.10633 [pdf, other]

LFFD: A Light and Fast Face Detector for Edge Devices

Authors: Yonghao He, Dezhong Xu, Lifang Wu, Meng Jian, Shiming Xiang, Chunhong Pan

Abstract: Face detection, as a fundamental technology for various applications, is always deployed on edge devices which have limited memory storage and low computing power. This paper introduces a Light and Fast Face Detector (LFFD) for edge devices. The proposed method is anchor-free and belongs to the one-stage category. Specifically, we rethink the importance of receptive field (RF) and effective recept… ▽ More Face detection, as a fundamental technology for various applications, is always deployed on edge devices which have limited memory storage and low computing power. This paper introduces a Light and Fast Face Detector (LFFD) for edge devices. The proposed method is anchor-free and belongs to the one-stage category. Specifically, we rethink the importance of receptive field (RF) and effective receptive field (ERF) in the background of face detection. Essentially, the RFs of neurons in a certain layer are distributed regularly in the input image and theses RFs are natural "anchors". Combining RF "anchors" and appropriate RF strides, the proposed method can detect a large range of continuous face scales with 100% coverage in theory. The insightful understanding of relations between ERF and face scales motivates an efficient backbone for one-stage detection. The backbone is characterized by eight detection branches and common layers, resulting in efficient computation. Comprehensive and extensive experiments on popular benchmarks: WIDER FACE and FDDB are conducted. A new evaluation schema is proposed for application-oriented scenarios. Under the new schema, the proposed method can achieve superior accuracy (WIDER FACE Val/Test -- Easy: 0.910/0.896, Medium: 0.881/0.865, Hard: 0.780/0.770; FDDB -- discontinuous: 0.973, continuous: 0.724). Multiple hardware platforms are introduced to evaluate the running efficiency. The proposed method can obtain fast inference speed (NVIDIA TITAN Xp: 131.45 FPS at 640x480; NVIDIA TX2: 136.99 PFS at 160x120; Raspberry Pi 3 Model B+: 8.44 FPS at 160x120) with model size of 9 MB. △ Less

Submitted 12 August, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

Comments: 10 pages, 4 figures

arXiv:1903.06879 [pdf, ps, other]

Ontology Based Global and Collective Motion Patterns for Event Classification in Basketball Videos

Authors: Lifang Wu, Zhou Yang, Jiaoyu He, Meng Jian, Yaowen Xu, Dezhong Xu, Chang Wen Chen

Abstract: In multi-person videos, especially team sport videos, a semantic event is usually represented as a confrontation between two teams of players, which can be represented as collective motion. In broadcast basketball videos, specific camera motions are used to present specific events. Therefore, a semantic event in broadcast basketball videos is closely related to both the global motion (camera motio… ▽ More In multi-person videos, especially team sport videos, a semantic event is usually represented as a confrontation between two teams of players, which can be represented as collective motion. In broadcast basketball videos, specific camera motions are used to present specific events. Therefore, a semantic event in broadcast basketball videos is closely related to both the global motion (camera motion) and the collective motion. A semantic event in basketball videos can be generally divided into three stages: pre-event, event occurrence (event-occ), and post-event. In this paper, we propose an ontology-based global and collective motion pattern (On_GCMP) algorithm for basketball event classification. First, a two-stage GCMP based event classification scheme is proposed. The GCMP is extracted using optical flow. The two-stage scheme progressively combines a five-class event classification algorithm on event-occs and a two-class event classification algorithm on pre-events. Both algorithms utilize sequential convolutional neural networks (CNNs) and long short-term memory (LSTM) networks to extract the spatial and temporal features of GCMP for event classification. Second, we utilize post-event segments to predict success/failure using deep features of images in the video frames (RGB_DF_VF) based algorithms. Finally the event classification results and success/failure classification results are integrated to obtain the final results. To evaluate the proposed scheme, we collected a new dataset called NCAA+, which is automatically obtained from the NCAA dataset by extending the fixed length of video clips forward and backward of the corresponding semantic events. The experimental results demonstrate that the proposed scheme achieves the mean average precision of 58.10% on NCAA+. It is higher by 6.50% than state-of-the-art on NCAA. △ Less

Submitted 19 March, 2019; v1 submitted 16 March, 2019; originally announced March 2019.

arXiv:1903.01340 [pdf, other]

doi 10.1109/TSP.2019.2949502

Beam Squint and Channel Estimation for Wideband mmWave Massive MIMO-OFDM Systems

Authors: Bolei Wang, Mengnan Jian, Feifei Gao, Geoffrey Ye Li, Shi **, Hai Lin

Abstract: With the increasing scale of antenna arrays in wideband millimeter-wave (mmWave) communications, the physical propagation delays of electromagnetic waves traveling across the whole array will become large and comparable to the time-domain sample period, which is known as the spatial-wideband effect. In this case, different subcarriers in an orthogonal frequency division multiplexing (OFDM) system… ▽ More With the increasing scale of antenna arrays in wideband millimeter-wave (mmWave) communications, the physical propagation delays of electromagnetic waves traveling across the whole array will become large and comparable to the time-domain sample period, which is known as the spatial-wideband effect. In this case, different subcarriers in an orthogonal frequency division multiplexing (OFDM) system will "see" distinct angles of arrival (AoAs) for the same path. This effect is known as beam squint, resulting from the spatial-wideband effect, and makes the approaches based on the conventional multiple-input multiple-output (MIMO) model, such as channel estimation and precoding, inapplicable. After discussing the relationship between beam squint and the spatial-wideband effect, we propose a channel estimation scheme for frequency-division duplex (FDD) mmWave massive MIMO-OFDM systems with hybrid analog/digital precoding, which takes the beam squint effect into consideration. A super-resolution compressed sensing approach is developed to extract the frequency-insensitive parameters of each uplink channel path, i.e., the AoA and the time delay, and the frequency-sensitive parameter, i.e., the complex channel gain. With the help of the reciprocity of these frequency-insensitive parameters in FDD systems, the downlink channel estimation can be greatly simplified, where only limited pilots are needed to obtain downlink complex gains and reconstruct downlink channels. Furthermore, the uplink and downlink channel covariance matrices can be constructed from these frequency-insensitive channel parameters rather than through a long-term average, which enables the minimum mean-squared error (MMSE) channel estimation to further enhance performance. Numerical results demonstrate the superiority of the proposed scheme over the conventional methods in mmWave communications. △ Less

Submitted 4 March, 2019; originally announced March 2019.

arXiv:1702.08692

Cascade one-vs-rest detection network for fine-grained recognition without part annotations

Authors: Long Chen, Junyu Dong, ShengKe Wang, Kin-Man Lam, Muwei Jian, Hua Zhang, XiaoChun Cao

Abstract: Fine-grained recognition is a challenging task due to the small intra-category variances. Most of top-performing fine-grained recognition methods leverage parts of objects for better performance. Therefore, part annotations which are extremely computationally expensive are required. In this paper, we propose a novel cascaded deep CNN detection framework for fine-grained recognition which is traine… ▽ More Fine-grained recognition is a challenging task due to the small intra-category variances. Most of top-performing fine-grained recognition methods leverage parts of objects for better performance. Therefore, part annotations which are extremely computationally expensive are required. In this paper, we propose a novel cascaded deep CNN detection framework for fine-grained recognition which is trained to detect the whole object without considering parts. Nevertheless, most of current top-performing detection networks use the N+1 class (N object categories plus background) softmax loss, and the background category with much more training samples dominates the feature learning progress so that the features are not good for object categories with fewer samples. To bridge this gap, we introduce a cascaded structure to eliminate background and exploit a one-vs-rest loss to capture more minute variances among different subordinate categories. Experiments show that our proposed recognition framework achieves comparable performance with state-of-the-art, part-free, fine-grained recognition methods on the CUB-200-2011 Bird dataset. Moreover, our method even outperforms most of part-based methods while does not need part annotations at the training stage and is free from any annotations at test stage. △ Less

Submitted 23 August, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: Part of authors has changed

Showing 1–16 of 16 results for author: Jian, M