Search | arXiv e-print repository

Inaccurate Label Distribution Learning with Dependency Noise

Authors: Zhiqiang Kou, **g Wang, Yuheng Jia, Xin Geng

Abstract: In this paper, we introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning, which arise from dependencies on instances and labels. We start by modeling the inaccurate label distribution matrix as a combination of the true label distribution and a noise matrix influenced by specific instance… ▽ More In this paper, we introduce the Dependent Noise-based Inaccurate Label Distribution Learning (DN-ILDL) framework to tackle the challenges posed by noise in label distribution learning, which arise from dependencies on instances and labels. We start by modeling the inaccurate label distribution matrix as a combination of the true label distribution and a noise matrix influenced by specific instances and labels. To address this, we develop a linear map** from instances to their true label distributions, incorporating label correlations, and decompose the noise matrix using feature and label representations, applying group sparsity constraints to accurately capture the noise. Furthermore, we employ graph regularization to align the topological structures of the input and output spaces, ensuring accurate reconstruction of the true label distribution matrix. Utilizing the Alternating Direction Method of Multipliers (ADMM) for efficient optimization, we validate our method's capability to recover true labels accurately and establish a generalization error bound. Extensive experiments demonstrate that DN-ILDL effectively addresses the ILDL problem and outperforms existing LDL methods. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2402.08023 [pdf, other]

UGMAE: A Unified Framework for Graph Masked Autoencoders

Authors: Yijun Tian, Chuxu Zhang, Ziyi Kou, Zheyuan Liu, Xiangliang Zhang, Nitesh V. Chawla

Abstract: Generative self-supervised learning on graphs, particularly graph masked autoencoders, has emerged as a popular learning paradigm and demonstrated its efficacy in handling non-Euclidean data. However, several remaining issues limit the capability of existing methods: 1) the disregard of uneven node significance in masking, 2) the underutilization of holistic graph information, 3) the ignorance of… ▽ More Generative self-supervised learning on graphs, particularly graph masked autoencoders, has emerged as a popular learning paradigm and demonstrated its efficacy in handling non-Euclidean data. However, several remaining issues limit the capability of existing methods: 1) the disregard of uneven node significance in masking, 2) the underutilization of holistic graph information, 3) the ignorance of semantic knowledge in the representation space due to the exclusive use of reconstruction loss in the output space, and 4) the unstable reconstructions caused by the large volume of masked contents. In light of this, we propose UGMAE, a unified framework for graph masked autoencoders to address these issues from the perspectives of adaptivity, integrity, complementarity, and consistency. Specifically, we first develop an adaptive feature mask generator to account for the unique significance of nodes and sample informative masks (adaptivity). We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information and emphasize the topological proximity between neighbors (integrity). After that, we present a bootstrap**-based similarity module to encode the high-level semantic knowledge in the representation space, complementary to the low-level reconstruction in the output space (complementarity). Finally, we build a consistency assurance module to provide reconstruction objectives with extra stabilized consistency targets (consistency). Extensive experiments demonstrate that UGMAE outperforms both contrastive and generative state-of-the-art baselines on several tasks across multiple datasets. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2312.05743 [pdf, other]

Building Variable-sized Models via Learngene Pool

Authors: Boyu Shi, Shiyu Xia, Xu Yang, Haokun Chen, Zhiqiang Kou, Xin Geng

Abstract: Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challe… ▽ More Recently, Stitchable Neural Networks (SN-Net) is proposed to stitch some pre-trained networks for quickly building numerous networks with different complexity and performance trade-offs. In this way, the burdens of designing or training the variable-sized networks, which can be used in application scenarios with diverse resource constraints, are alleviated. However, SN-Net still faces a few challenges. 1) Stitching from multiple independently pre-trained anchors introduces high storage resource consumption. 2) SN-Net faces challenges to build smaller models for low resource constraints. 3). SN-Net uses an unlearned initialization method for stitch layers, limiting the final performance. To overcome these challenges, motivated by the recently proposed Learngene framework, we propose a novel method called Learngene Pool. Briefly, Learngene distills the critical knowledge from a large pre-trained model into a small part (termed as learngene) and then expands this small part into a few variable-sized models. In our proposed method, we distill one pretrained large model into multiple small models whose network blocks are used as learngene instances to construct the learngene pool. Since only one large model is used, we do not need to store more large models as SN-Net and after distilling, smaller learngene instances can be created to build small models to satisfy low resource constraints. We also insert learnable transformation matrices between the instances to stitch them into variable-sized models to improve the performance of these models. Exhaustive experiments have been implemented and the results validate the effectiveness of the proposed Learngene Pool compared with SN-Net. △ Less

Submitted 11 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

arXiv:2310.06448 [pdf, other]

Asynchronous Federated Learning with Incentive Mechanism Based on Contract Theory

Authors: Danni Yang, Yun Ji, Zhoubin Kou, Xiaoxiong Zhong, Sheng Zhang

Abstract: To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an… ▽ More To address the challenges posed by the heterogeneity inherent in federated learning (FL) and to attract high-quality clients, various incentive mechanisms have been employed. However, existing incentive mechanisms are typically utilized in conventional synchronous aggregation, resulting in significant straggler issues. In this study, we propose a novel asynchronous FL framework that integrates an incentive mechanism based on contract theory. Within the incentive mechanism, we strive to maximize the utility of the task publisher by adaptively adjusting clients' local model training epochs, taking into account factors such as time delay and test accuracy. In the asynchronous scheme, considering client quality, we devise aggregation weights and an access control algorithm to facilitate asynchronous aggregation. Through experiments conducted on the MNIST dataset, the simulation results demonstrate that the test accuracy achieved by our framework is 3.12% and 5.84% higher than that achieved by FedAvg and FedProx without any attacks, respectively. The framework exhibits a 1.35% accuracy improvement over the ideal Local SGD under attacks. Furthermore, aiming for the same target accuracy, our framework demands notably less computation time than both FedAvg and FedProx. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06162 [pdf]

Empirical Evaluation of the Segment Anything Model (SAM) for Brain Tumor Segmentation

Authors: Mohammad Peivandi, Jason Zhang, Michael Lu, Dongxiao Zhu, Zhifeng Kou

Abstract: Brain tumor segmentation presents a formidable challenge in the field of Medical Image Segmentation. While deep-learning models have been useful, human expert segmentation remains the most accurate method. The recently released Segment Anything Model (SAM) has opened up the opportunity to apply foundation models to this difficult task. However, SAM was primarily trained on diverse natural images.… ▽ More Brain tumor segmentation presents a formidable challenge in the field of Medical Image Segmentation. While deep-learning models have been useful, human expert segmentation remains the most accurate method. The recently released Segment Anything Model (SAM) has opened up the opportunity to apply foundation models to this difficult task. However, SAM was primarily trained on diverse natural images. This makes applying SAM to biomedical segmentation, such as brain tumors with less defined boundaries, challenging. In this paper, we enhanced SAM's mask decoder using transfer learning with the Decathlon brain tumor dataset. We developed three methods to encapsulate the four-dimensional data into three dimensions for SAM. An on-the-fly data augmentation approach has been used with a combination of rotations and elastic deformations to increase the size of the training dataset. Two key metrics: the Dice Similarity Coefficient (DSC) and the Hausdorff Distance 95th Percentile (HD95), have been applied to assess the performance of our segmentation models. These metrics provided valuable insights into the quality of the segmentation results. In our evaluation, we compared this improved model to two benchmarks: the pretrained SAM and the widely used model, nnUNetv2. We find that the improved SAM shows considerable improvement over the pretrained SAM, while nnUNetv2 outperformed the improved SAM in terms of overall segmentation accuracy. Nevertheless, the improved SAM demonstrated slightly more consistent results than nnUNetv2, especially on challenging cases that can lead to larger Hausdorff distances. In the future, more advanced techniques can be applied in order to further improve the performance of SAM on brain tumor segmentation. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2308.16319 [pdf, other]

A Radiological Clip Design Using Ultrasound Identification to Improve Localization

Authors: Jenna Cario, Zhengchang Kou, Rita J. Miller, April Dickenson, Christine U. Lee, Michael L. Oelze

Abstract: Objective: We demonstrate the use of ultrasound to receive an acoustic signal transmitted from a radiological clip designed from a custom circuit. This signal encodes an identification number and is localized and identified wirelessly by the ultrasound imaging system. Methods: We designed and constructed the test platform with a Teensy 4.0 microcontroller core to detect ultrasonic imaging pulses r… ▽ More Objective: We demonstrate the use of ultrasound to receive an acoustic signal transmitted from a radiological clip designed from a custom circuit. This signal encodes an identification number and is localized and identified wirelessly by the ultrasound imaging system. Methods: We designed and constructed the test platform with a Teensy 4.0 microcontroller core to detect ultrasonic imaging pulses received by a transducer embedded in a phantom, which acted as the radiological clip. Ultrasound identification (USID) signals were generated and transmitted as a result. The phantom and clip were imaged using an ultrasonic array (Philips L7-4) connected to a Verasonics Vantage 128 system operating in pulse inversion (PI) mode. Cross-correlations were performed to localize and identify the code sequences in the PI images. Results: USID signals were detected and visualized on B-mode images of the phantoms with up to sub-millimeter localization accuracy. The average detection rate across 4,800 frames of ultrasound data was 93.0%. Tested ID values exhibited differences in detection rates. Conclusion: The USID clip produced identifiable, distinguishable, and localizable signals when imaged. Significance: Radiological clips are used to mark breast cancer being treated by neoadjuvant chemotherapy (NAC) via implant in or near treated lesions. As NAC progresses, available marking clips can lose visibility in ultrasound, the imaging modality of choice for monitoring NAC-treated lesions. By transmitting an active signal, more accurate and reliable ultrasound localization of these clips could be achieved and multiple clips with different ID values could be imaged in the same field of view. △ Less

Submitted 1 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 8 pages, 6 figures, for associated .gif files, see https://drive.google.com/drive/folders/1yhRTtPJQ6mDHKmcxeqGnqy1oCQVsSDwC?usp=drive_link, submitted to IEEE Transactions on Biomedical Engineering (TBME) Revised 2/1/24: two figures converted to tables, introduction revised, results and discussion revised for n = 3 trials, in vivo experiment data added, added Rita J. Miller as author

arXiv:2308.04947 [pdf, other]

Methods for Acquiring and Incorporating Knowledge into Stock Price Prediction: A Survey

Authors: Li** Wang, Jiawei Li, Lifan Zhao, Zhizhuo Kou, Xiaohan Wang, Xinyi Zhu, Hao Wang, Yanyan Shen, Lei Chen

Abstract: Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematical… ▽ More Predicting stock prices presents a challenging research problem due to the inherent volatility and non-linear nature of the stock market. In recent years, knowledge-enhanced stock price prediction methods have shown groundbreaking results by utilizing external knowledge to understand the stock market. Despite the importance of these methods, there is a scarcity of scholarly works that systematically synthesize previous studies from the perspective of external knowledge types. Specifically, the external knowledge can be modeled in different data structures, which we group into non-graph-based formats and graph-based formats: 1) non-graph-based knowledge captures contextual information and multimedia descriptions specifically associated with an individual stock; 2) graph-based knowledge captures interconnected and interdependent information in the stock market. This survey paper aims to provide a systematic and comprehensive description of methods for acquiring external knowledge from various unstructured data sources and then incorporating it into stock price prediction models. We also explore fusion methods for combining external knowledge with historical price features. Moreover, this paper includes a compilation of relevant datasets and delves into potential future research directions in this domain. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2305.04066 [pdf, other]

Semi-Asynchronous Federated Edge Learning Mechanism via Over-the-air Computation

Authors: Zhoubin Kou, Yun Ji, Xiaoxiong Zhong, Sheng Zhang

Abstract: Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous… ▽ More Over-the-air Computation (AirComp) has been demonstrated as an effective transmission scheme to boost the efficiency of federated edge learning (FEEL). However, existing FEEL systems with AirComp scheme often employ traditional synchronous aggregation mechanisms for local model aggregation in each global round, which suffer from the stragglers issues. In this paper, we propose a semi-asynchronous aggregation FEEL mechanism with AirComp scheme (PAOTA) to improve the training efficiency of the FEEL system in the case of significant heterogeneity in data and devices. Taking the staleness and divergence of model updates from edge devices into consideration, we minimize the convergence upper bound of the FEEL global model by adjusting the uplink transmit power of edge devices at each aggregation period. The simulation results demonstrate that our proposed algorithm achieves convergence performance close to that of the ideal Local SGD. Furthermore, with the same target accuracy, the training time required for PAOTA is less than that of the ideal Local SGD and the synchronous FEEL algorithm via AirComp. △ Less

Submitted 29 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2303.11698 [pdf, other]

Data Augmentation For Label Enhancement

Authors: Zhiqiang Kou, Yuheng Jia, **g Wang, Boyu Shi, Xin Geng

Abstract: Label distribution (LD) uses the description degree to describe instances, which provides more fine-grained supervision information when learning with label ambiguity. Nevertheless, LD is unavailable in many real-world applications. To obtain LD, label enhancement (LE) has emerged to recover LD from logical label. Existing LE approach have the following problems: (\textbf{i}) They use logical labe… ▽ More Label distribution (LD) uses the description degree to describe instances, which provides more fine-grained supervision information when learning with label ambiguity. Nevertheless, LD is unavailable in many real-world applications. To obtain LD, label enhancement (LE) has emerged to recover LD from logical label. Existing LE approach have the following problems: (\textbf{i}) They use logical label to train map**s to LD, but the supervision information is too loose, which can lead to inaccurate model prediction; (\textbf{ii}) They ignore feature redundancy and use the collected features directly. To solve (\textbf{i}), we use the topology of the feature space to generate more accurate label-confidence. To solve (\textbf{ii}), we proposed a novel supervised LE dimensionality reduction approach, which projects the original data into a lower dimensional feature space. Combining the above two, we obtain the augmented data for LE. Further, we proposed a novel nonlinear LE model based on the label-confidence and reduced features. Extensive experiments on 12 real-world datasets are conducted and the results show that our method consistently outperforms the other five comparing approaches. △ Less

Submitted 21 March, 2023; originally announced March 2023.

arXiv:2302.13000 [pdf, other]

Inaccurate Label Distribution Learning

Authors: Zhiqiang Kou, Yuheng Jia, **g Wang, Xin Geng

Abstract: Label distribution learning (LDL) trains a model to predict the relevance of a set of labels (called label distribution (LD)) to an instance. The previous LDL methods all assumed the LDs of the training instances are accurate. However, annotating highly accurate LDs for training instances is time-consuming and very expensive, and in reality the collected LD is usually inaccurate and disturbed by a… ▽ More Label distribution learning (LDL) trains a model to predict the relevance of a set of labels (called label distribution (LD)) to an instance. The previous LDL methods all assumed the LDs of the training instances are accurate. However, annotating highly accurate LDs for training instances is time-consuming and very expensive, and in reality the collected LD is usually inaccurate and disturbed by annotating errors. For the first time, this paper investigates the problem of inaccurate LDL, i.e., develo** an LDL model with noisy LDs. We assume that the noisy LD matrix is a linear combination of an ideal LD matrix and a sparse noise matrix. Consequently, the problem of inaccurate LDL becomes an inverse problem, where the objective is to recover the ideal LD and noise matrices from the noisy LDs. We hypothesize that the ideal LD matrix is low-rank due to the correlation of labels and utilize the local geometric structure of instances captured by a graph to assist in recovering the ideal LD. This is based on the premise that similar instances are likely to share the same LD. The proposed model is finally formulated as a graph-regularized low-rank and sparse decomposition problem and numerically solved by the alternating direction method of multipliers. Furthermore, a specialized objective function is utilized to induce a LD predictive model in LDL, taking into account the recovered label distributions. Extensive experiments conducted on multiple datasets from various real-world tasks effectively demonstrate the efficacy of the proposed approach. \end{abstract} △ Less

Submitted 26 August, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2301.03719 [pdf]

doi 10.1109/TMI.2024.3383768

High-resolution Power Doppler Using Null Subtraction Imaging

Authors: Zhengchang Kou, Matthew Lowerison, Qi You, Yike Wang, Pengfei Song, Michael L. Oelze

Abstract: To improve the spatial resolution of power Doppler (PD) imaging, we explored null subtraction imaging (NSI) as an alternative beamforming technique to delay-and-sum (DAS). NSI is a nonlinear beamforming approach that uses three different apodizations on receive and incoherently sums the beamformed envelopes. NSI uses a null in the beam pattern to improve the lateral resolution, which we apply here… ▽ More To improve the spatial resolution of power Doppler (PD) imaging, we explored null subtraction imaging (NSI) as an alternative beamforming technique to delay-and-sum (DAS). NSI is a nonlinear beamforming approach that uses three different apodizations on receive and incoherently sums the beamformed envelopes. NSI uses a null in the beam pattern to improve the lateral resolution, which we apply here for improving PD spatial resolution both with and without contrast microbubbles. In this study, we used NSI with three types of singular value decomposition (SVD)-based clutter filters and noise equalization to generate high-resolution PD images. An element sensitivity correction scheme was also proposed as a crucial component of NSI-based PD imaging. First, a microbubble trace experiment was performed to evaluate the resolution improvement of NSI-based PD over traditional DAS-based PD. Then, both contrast-enhanced and contrast free ultrasound PD images were generated from the scan of a rat brain. The cross-sectional profile of the microbubble traces and microvessels were plotted. FWHM was also estimated to provide a quantitative metric. Furthermore, iso-frequency curves were calculated to provide a resolution evaluation metric over the global field of view. Up to six-fold resolution improvement was demonstrated by the FWHM estimate and four-fold resolution improvement was demonstrated by the iso-frequency curve from the NSI-based PD microvessel images compared to microvessel images generated by traditional DAS-based beamforming. A resolvability of 39 um was measured from the NSI-based PD microvessel image. The computational cost of NSI-based PD was only increased by 40 percent over the DAS-based PD. △ Less

Submitted 2 April, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

arXiv:2211.02935 [pdf, other]

Efficient Cavity Searching for Gene Network of Influenza A Virus

Authors: Junjie Li, Jietong Zhao, Yanqing Su, Jiahao Shen, Yaohua Liu, Xinyue Fan, Zheng Kou

Abstract: High order structures (cavities and cliques) of the gene network of influenza A virus reveal tight associations among viruses during evolution and are key signals that indicate viral cross-species infection and cause pandemics. As indicators for sensing the dynamic changes of viral genes, these higher order structures have been the focus of attention in the field of virology. However, the size of… ▽ More High order structures (cavities and cliques) of the gene network of influenza A virus reveal tight associations among viruses during evolution and are key signals that indicate viral cross-species infection and cause pandemics. As indicators for sensing the dynamic changes of viral genes, these higher order structures have been the focus of attention in the field of virology. However, the size of the viral gene network is usually huge, and searching these structures in the networks introduces unacceptable delay. To mitigate this issue, in this paper, we propose a simple-yet-effective model named HyperSearch based on deep learning to search cavities in a computable complex network for influenza virus genetics. Extensive experiments conducted on a public influenza virus dataset demonstrate the effectiveness of HyperSearch over other advanced deep-learning methods without any elaborated model crafting. Moreover, HyperSearch can finish the search works in minutes while 0-1 programming takes days. Since the proposed method is simple and easy to be transferred to other complex networks, HyperSearch has the potential to facilitate the monitoring of dynamic changes in viral genes and help humans keep up with the pace of virus mutations. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: work in progress

arXiv:2210.13546 [pdf]

doi 10.1109/TUFFC.2022.3217993

Grating lobe reduction in plane wave imaging with angular compounding using subtraction of coherent signals

Authors: Zhengchang Kou, Rita J. Miller, Michael L. Oelze

Abstract: Plane wave imaging (PWI) with angular compounding has gained in popularity over recent years because it provides high frame rates and good image properties. However, most linear arrays used in clinical practice have a pitch that is equal to than the wavelength of ultrasound. Hence, the presence of grating lobes is a concern for PWI using multiple transmit angles. The presence of grating lobes prod… ▽ More Plane wave imaging (PWI) with angular compounding has gained in popularity over recent years because it provides high frame rates and good image properties. However, most linear arrays used in clinical practice have a pitch that is equal to than the wavelength of ultrasound. Hence, the presence of grating lobes is a concern for PWI using multiple transmit angles. The presence of grating lobes produces clutter in images and reduces the ability to observe tissue contrast. Techniques to reduce or eliminate the presence of grating lobes for PWI using multiple angles will result in improved image quality. Null subtraction imaging (NSI) is a nonlinear beamforming technique that has been explored for improving the lateral resolution of ultrasonic imaging. However, the apodization scheme used in NSI also eliminates or greatly reduces the presence of grating lobes. Imaging tasks using NSI were evaluated in simulations and physical experiments involving tissue-mimicking phantoms and rat tumors in vivo. Images created with NSI were compared with images created using traditional delay and sum (DAS) with Hann apodization and images created using a generalized coherence factor (GCF). NSI was observed to greatly reduce the presence of grating lobes in ultrasonic images, compared to DAS with Hann and GCF, while maintaining spatial resolution and contrast in the images. Therefore, NSI can provide a novel means of creating images using PWI with multiple steering angles on clinically available linear arrays while reducing the adverse effects associated with grating lobes. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.09489 [pdf]

Through Tissue Ultra-high-definition Video Transmission Using an Ultrasound Communication Channel

Authors: Zhengchang Kou, Andrew C. Singer, Michael L. Oelze

Abstract: Wireless capsule endoscopy (WCE) has been widely adopted as complementary to traditional wired gastroendoscopy, especially for small bowel diseases which are beyond the latter's reach. However, both the video resolution and frame rates are limited in current WCE solutions due to the limited wireless data rate. The reasons behind this are that the electromagnetic (EM), radio frequency (RF) based co… ▽ More Wireless capsule endoscopy (WCE) has been widely adopted as complementary to traditional wired gastroendoscopy, especially for small bowel diseases which are beyond the latter's reach. However, both the video resolution and frame rates are limited in current WCE solutions due to the limited wireless data rate. The reasons behind this are that the electromagnetic (EM), radio frequency (RF) based communication scheme used by WCE has strict limits on useable bandwidth and power, and the high attenuation in the human body compared to air. Ultrasound communication could be a potential alternative solution as it has access to much higher bandwidths and transmitted power with much lower attenuation. In this paper, we propose an ultrasound communication scheme specially designed for high data rate through tissue data transmission and validate this communication scheme by successfully transmitting ultra-high-definition (UHD) video (3840*2160 pixels at 60 FPS) through 5 cm of pork belly. Over 8.3 Mbps error free payload data rate was achieved with the proposed communication scheme and our custom-built field programmable gate array (FPGA) based test platform. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2210.03250 [pdf, other]

Unsupervised Domain Adaptation for COVID-19 Information Service with Contrastive Adversarial Domain Mixup

Authors: Huimin Zeng, Zhenrui Yue, Ziyi Kou, Lanyu Shang, Yang Zhang, Dong Wang

Abstract: In the real-world application of COVID-19 misinformation detection, a fundamental challenge is the lack of the labeled COVID data to enable supervised end-to-end training of the models, especially at the early stage of the pandemic. To address this challenge, we propose an unsupervised domain adaptation framework using contrastive learning and adversarial domain mixup to transfer the knowledge fro… ▽ More In the real-world application of COVID-19 misinformation detection, a fundamental challenge is the lack of the labeled COVID data to enable supervised end-to-end training of the models, especially at the early stage of the pandemic. To address this challenge, we propose an unsupervised domain adaptation framework using contrastive learning and adversarial domain mixup to transfer the knowledge from an existing source data domain to the target COVID-19 data domain. In particular, to bridge the gap between the source domain and the target domain, our method reduces a radial basis function (RBF) based discrepancy between these two domains. Moreover, we leverage the power of domain adversarial examples to establish an intermediate domain mixup, where the latent representations of the input text from both domains could be mixed during the training process. Extensive experiments on multiple real-world datasets suggest that our method can effectively adapt misinformation detection systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.02191 [pdf, other]

On Attacking Out-Domain Uncertainty Estimation in Deep Neural Networks

Authors: Huimin Zeng, Zhenrui Yue, Yang Zhang, Ziyi Kou, Lanyu Shang, Dong Wang

Abstract: In many applications with real-world consequences, it is crucial to develop reliable uncertainty estimation for the predictions made by the AI decision systems. Targeting at the goal of estimating uncertainty, various deep neural network (DNN) based uncertainty estimation algorithms have been proposed. However, the robustness of the uncertainty returned by these algorithms has not been systematica… ▽ More In many applications with real-world consequences, it is crucial to develop reliable uncertainty estimation for the predictions made by the AI decision systems. Targeting at the goal of estimating uncertainty, various deep neural network (DNN) based uncertainty estimation algorithms have been proposed. However, the robustness of the uncertainty returned by these algorithms has not been systematically explored. In this work, to raise the awareness of the research community on robust uncertainty estimation, we show that state-of-the-art uncertainty estimation algorithms could fail catastrophically under our proposed adversarial attack despite their impressive performance on uncertainty estimation. In particular, we aim at attacking the out-domain uncertainty estimation: under our attack, the uncertainty model would be fooled to make high-confident predictions for the out-domain data, which they originally would have rejected. Extensive experimental results on various benchmark image datasets show that the uncertainty estimated by state-of-the-art methods could be easily corrupted by our attack. △ Less

Submitted 12 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2209.04998 [pdf, other]

Domain Adaptation for Question Answering via Question Classification

Authors: Zhenrui Yue, Huimin Zeng, Ziyi Kou, Lanyu Shang, Dong Wang

Abstract: Question answering (QA) has demonstrated impressive progress in answering questions from customized domains. Nevertheless, domain adaptation remains one of the most elusive challenges for QA systems, especially when QA systems are trained in a source domain but deployed in a different target domain. In this work, we investigate the potential benefits of question classification for QA domain adapta… ▽ More Question answering (QA) has demonstrated impressive progress in answering questions from customized domains. Nevertheless, domain adaptation remains one of the most elusive challenges for QA systems, especially when QA systems are trained in a source domain but deployed in a different target domain. In this work, we investigate the potential benefits of question classification for QA domain adaptation. We propose a novel framework: Question Classification for Question Answering (QC4QA). Specifically, a question classifier is adopted to assign question classes to both the source and target data. Then, we perform joint training in a self-supervised fashion via pseudo-labeling. For optimization, inter-domain discrepancy between the source and target domain is reduced via maximum mean discrepancy (MMD) distance. We additionally minimize intra-class discrepancy among QA samples of the same question class for fine-grained adaptation performance. To the best of our knowledge, this is the first work in QA domain adaptation to leverage question classification with self-supervised adaptation. We demonstrate the effectiveness of the proposed QC4QA with consistent improvements against the state-of-the-art baselines on multiple datasets. △ Less

Submitted 2 October, 2022; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: Accepted to COLING 2022

arXiv:2209.00845 [pdf]

Brownian motion of nonlinear oscillator in van der Waals trap

Authors: Xiaofei Liu, Fangyuan Chen, Zepu Kou, Wanlin Guo

Abstract: Van der Waals trap, a quantum fluctuation-induced potential characterized by short-range repulsive and long-range attractive forces, is intrinsically nonlinear. This work unveils the nonlinear effects on Brownian oscillators in the van der Waals trap using Langevin dynamics simulations and quasiharmonic approximations. While neither size- nor temperature-dependences of effective natural frequency… ▽ More Van der Waals trap, a quantum fluctuation-induced potential characterized by short-range repulsive and long-range attractive forces, is intrinsically nonlinear. This work unveils the nonlinear effects on Brownian oscillators in the van der Waals trap using Langevin dynamics simulations and quasiharmonic approximations. While neither size- nor temperature-dependences of effective natural frequency is important for suspended plates of large areas, smaller ones with broader probability distributions are significantly softened and even a temperature-induced softening is observed. Despite the nonlinearity, the stiffness and the coefficient of friction are tunable by changing the thickness of coating and by modifying the size and the perforation condition of suspended plates, respectively, endowing the quantum trap with flexibilities of building up microscopic mechanical systems and probing near-boundary hydrodynamics. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: 15 pages, 8 figures

arXiv:2208.09578 [pdf, other]

Contrastive Domain Adaptation for Early Misinformation Detection: A Case Study on COVID-19

Authors: Zhenrui Yue, Huimin Zeng, Ziyi Kou, Lanyu Shang, Dong Wang

Abstract: Despite recent progress in improving the performance of misinformation detection systems, classifying misinformation in an unseen domain remains an elusive challenge. To address this issue, a common approach is to introduce a domain critic and encourage domain-invariant input features. However, early misinformation often demonstrates both conditional and label shifts against existing misinformatio… ▽ More Despite recent progress in improving the performance of misinformation detection systems, classifying misinformation in an unseen domain remains an elusive challenge. To address this issue, a common approach is to introduce a domain critic and encourage domain-invariant input features. However, early misinformation often demonstrates both conditional and label shifts against existing misinformation data (e.g., class imbalance in COVID-19 datasets), rendering such methods less effective for detecting early misinformation. In this paper, we propose contrastive adaptation network for early misinformation detection (CANMD). Specifically, we leverage pseudo labeling to generate high-confidence target examples for joint training with source data. We additionally design a label correction component to estimate and correct the label shifts (i.e., class priors) between the source and target domains. Moreover, a contrastive adaptation loss is integrated in the objective function to reduce the intra-class discrepancy and enlarge the inter-class discrepancy. As such, the adapted model learns corrected class priors and an invariant conditional distribution across both domains for improved estimation of the target data distribution. To demonstrate the effectiveness of the proposed CANMD, we study the case of COVID-19 early misinformation detection and perform extensive experiments using multiple real-world datasets. The results suggest that CANMD can effectively adapt misinformation detection systems to the unseen COVID-19 target domain with significant improvements compared to the state-of-the-art baselines. △ Less

Submitted 2 October, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

Comments: Accepted to CIKM 2022

arXiv:2208.03429 [pdf]

doi 10.1109/TBCAS.2023.3267614

High-level synthesis design of scalable ultrafast ultrasound beamformer with single FPGA

Authors: Zhengchang Kou, Qi You, Jihun Kim, Zhijie Dong, Matthew R. Lowerison, Nathiya V. Chandra Sekaran, Daniel A. Llano, Pengfei Song, Michael L. Oelze

Abstract: Ultrafast ultrasound imaging is essential for advanced ultrasound imaging techniques such as ultrasound localization microscopy (ULM) and functional ultrasound (fUS). Current ultrafast ultrasound imaging is challenged by the ultrahigh data bandwidth associated with the radio frequency (RF) signal, and by the latency of the computationally expensive beamforming process. As such, continuous ultrafas… ▽ More Ultrafast ultrasound imaging is essential for advanced ultrasound imaging techniques such as ultrasound localization microscopy (ULM) and functional ultrasound (fUS). Current ultrafast ultrasound imaging is challenged by the ultrahigh data bandwidth associated with the radio frequency (RF) signal, and by the latency of the computationally expensive beamforming process. As such, continuous ultrafast data acquisition and beamforming remain elusive with existing software beamformers based on CPUs or GPUs. To address these challenges, the proposed work introduces a novel method of implementing an ultrafast ultrasound beamformer specifically for ultrafast plane wave imaging (PWI) on a field programmable gate array (FPGA) by using high-level synthesis. A parallelized implementation of the beamformer on a single FPGA was proposed by 1) utilizing a delay compression technique to reduce the delay profile size, which enables both run-time pre-calculated delay profile loading from external memory and delay reuse 2) vectorizing channel data fetching which is enabled by delay reuse, and 3) using fixed summing networks to reduce consumption of logic resources. Our proposed method presents two unique advantages over current FPGA beamformers: 1) high scalability that allows fast adaptation to different FPGA resources and beamforming speed demands by using Xilinx High-Level Synthesis as the development tool, and 2) allow a compact form factor design by using a single FPGA to complete the beamforming instead of multiple FPGAs. With the proposed method, a sustainable average beamforming rate of 4.83 G samples/second in terms of input raw RF sample was achieved. The resulting image quality of the proposed beamformer was compared with the software beamformer on the Verasonics Vantage system for both phantom imaging and in vivo imaging of a mouse brain. △ Less

Submitted 13 April, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

arXiv:2207.11237 [pdf, other]

Defending Substitution-Based Profile Pollution Attacks on Sequential Recommenders

Authors: Zhenrui Yue, Huimin Zeng, Ziyi Kou, Lanyu Shang, Dong Wang

Abstract: While sequential recommender systems achieve significant improvements on capturing user dynamics, we argue that sequential recommenders are vulnerable against substitution-based profile pollution attacks. To demonstrate our hypothesis, we propose a substitution-based adversarial attack algorithm, which modifies the input sequence by selecting certain vulnerable elements and substituting them with… ▽ More While sequential recommender systems achieve significant improvements on capturing user dynamics, we argue that sequential recommenders are vulnerable against substitution-based profile pollution attacks. To demonstrate our hypothesis, we propose a substitution-based adversarial attack algorithm, which modifies the input sequence by selecting certain vulnerable elements and substituting them with adversarial items. In both untargeted and targeted attack scenarios, we observe significant performance deterioration using the proposed profile pollution algorithm. Motivated by such observations, we design an efficient adversarial defense method called Dirichlet neighborhood sampling. Specifically, we sample item embeddings from a convex hull constructed by multi-hop neighbors to replace the original items in input sequences. During sampling, a Dirichlet distribution is used to approximate the probability distribution in the neighborhood such that the recommender learns to combat local perturbations. Additionally, we design an adversarial training method tailored for sequential recommender systems. In particular, we represent selected items with one-hot encodings and perform gradient ascent on the encodings to search for the worst case linear combination of item embeddings in training. As such, the embedding function learns robust item representations and the trained recommender is resistant to test-time adversarial examples. Extensive experiments show the effectiveness of both our attack and defense methods, which consistently outperform baselines by a significant margin across model architectures and datasets. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: Accepted to RecSys 2022

arXiv:2205.04709 [pdf, other]

Client Selection and Bandwidth Allocation for Federated Learning: An Online Optimization Perspective

Authors: Yun Ji, Zhoubin Kou, Xiaoxiong Zhong, Sheng Zhang, Hangfan Li, Fan Yang

Abstract: Federated learning (FL) can train a global model from clients' local data set, which can make full use of the computing resources of clients and performs more extensive and efficient machine learning on clients with protecting user information requirements. Many existing works have focused on optimizing FL accuracy within the resource constrained in each individual round, however there are few wor… ▽ More Federated learning (FL) can train a global model from clients' local data set, which can make full use of the computing resources of clients and performs more extensive and efficient machine learning on clients with protecting user information requirements. Many existing works have focused on optimizing FL accuracy within the resource constrained in each individual round, however there are few works comprehensively consider the optimization for latency, accuracy and energy consumption over all rounds in wireless federated learning. Inspired by this, in this paper, we investigate FL in wireless network where client selection and bandwidth allocation are two crucial factors which significantly affect the latency, accuracy and energy consumption of clients. We formulate the optimization problem as a mixed-integer problem, which is to minimize the cost of time and accuracy within the long-term energy constrained over all rounds. To address this optimization, we propose the Perround Energy Drift Plus Cost (PEDPC) algorithm in an online perspective, and the performance of the PEDPC algorithm is verified in simulation results in terms of latency, accuracy and energy consumption in IID and NON-IID dat distributions. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: submitted to a conference

arXiv:2203.16537 [pdf, other]

Efficient Localness Transformer for Smart Sensor-Based Energy Disaggregation

Authors: Zhenrui Yue, Huimin Zeng, Ziyi Kou, Lanyu Shang, Dong Wang

Abstract: Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decomposition of electricity usage conditioned on the aggregated power signals (i.e., smart sensor on the main channel). Based on real-time appliance power prediction using sensory technolog… ▽ More Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decomposition of electricity usage conditioned on the aggregated power signals (i.e., smart sensor on the main channel). Based on real-time appliance power prediction using sensory technology, energy disaggregation has great potential to increase electricity efficiency and reduce energy expenditure. With the introduction of transformer models, NILM has achieved significant improvements in predicting device power readings. Nevertheless, transformers are less efficient due to O(l^2) complexity w.r.t. sequence length l. Moreover, transformers can fail to capture local signal patterns in sequence-to-point settings due to the lack of inductive bias in local context. In this work, we propose an efficient localness transformer for non-intrusive load monitoring (ELTransformer). Specifically, we leverage normalization functions and switch the order of matrix multiplication to approximate self-attention and reduce computational complexity. Additionally, we introduce localness modeling with sparse local attention heads and relative position encodings to enhance the model capacity in extracting short-term local patterns. To the best of our knowledge, ELTransformer is the first NILM model that addresses computational complexity and localness modeling in NILM. With extensive experiments and quantitative analyses, we demonstrate the efficiency and effectiveness of the the proposed ELTransformer with considerable improvements compared to state-of-the-art baselines. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted to DCOSS 2022

arXiv:2111.12212 [pdf, other]

Long-Term CSI-based Design for RIS-Aided Multiuser MISO Systems Exploiting Deep Reinforcement Learning

Authors: Hong Ren, Cunhua Pan, Liang Wang, Zhoubing Kou, Kezhi Wang

Abstract: In this paper, we study the transmission design for reconfigurable intelligent surface (RIS)-aided multiuser communication networks. Different from most of the existing contributions, we consider long-term CSI-based transmission design, where both the beamforming vectors at the base station (BS) and the phase shifts at the RIS are designed based on long-term CSI, which can significantly reduce the… ▽ More In this paper, we study the transmission design for reconfigurable intelligent surface (RIS)-aided multiuser communication networks. Different from most of the existing contributions, we consider long-term CSI-based transmission design, where both the beamforming vectors at the base station (BS) and the phase shifts at the RIS are designed based on long-term CSI, which can significantly reduce the channel estimation overhead. Due to the lack of explicit ergodic data rate expression, we propose a novel deep deterministic policy gradient (DDPG) based algorithm to solve the optimization problem, which was trained by using the channel vectors generated in an offline manner. Simulation results demonstrate that the achievable net throughput is higher than that achieved by the conventional instantaneous-CSI based scheme when taking the channel estimation overhead into account. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: Under revision in IEEE journal. Keywords: Reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS)

arXiv:2106.15434 [pdf, other]

Zoo-Tuning: Adaptive Transfer from a Zoo of Models

Authors: Yang Shu, Zhi Kou, Zhangjie Cao, Jianmin Wang, Mingsheng Long

Abstract: With the development of deep networks on various large-scale datasets, a large zoo of pretrained models are available. When transferring from a model zoo, applying classic single-model based transfer learning methods to each source model suffers from high computational burden and cannot fully utilize the rich knowledge in the zoo. We propose \emph{Zoo-Tuning} to address these challenges, which lea… ▽ More With the development of deep networks on various large-scale datasets, a large zoo of pretrained models are available. When transferring from a model zoo, applying classic single-model based transfer learning methods to each source model suffers from high computational burden and cannot fully utilize the rich knowledge in the zoo. We propose \emph{Zoo-Tuning} to address these challenges, which learns to adaptively transfer the parameters of pretrained models to the target task. With the learnable channel alignment layer and adaptive aggregation layer, Zoo-Tuning \emph{adaptively aggregates channel aligned pretrained parameters} to derive the target model, which promotes knowledge transfer by simultaneously adapting multiple source models to downstream tasks. The adaptive aggregation substantially reduces the computation cost at both training and inference. We further propose lite Zoo-Tuning with the temporal ensemble of batch average gating values to reduce the storage cost at the inference time. We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection. Experiment results demonstrate that the proposed adaptive transfer learning approach can transfer knowledge from a zoo of models more effectively and efficiently. △ Less

Submitted 29 June, 2021; originally announced June 2021.

Comments: Accepted by ICML 2021

arXiv:2011.06182 [pdf, other]

Bi-tuning of Pre-trained Representations

Authors: **cheng Zhong, Ximei Wang, Zhi Kou, Jianmin Wang, Mingsheng Long

Abstract: It is common within the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure o… ▽ More It is common within the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure of data, respectively. It follows natural intuition that both discriminative knowledge and intrinsic structure of the downstream task can be useful for fine-tuning, however, existing fine-tuning methods mainly leverage the former and discard the latter. A question arises: How to fully explore the intrinsic structure of data for boosting fine-tuning? In this paper, we propose Bi-tuning, a general learning framework to fine-tuning both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations: a classifier head with an improved contrastive cross-entropy loss to better leverage the label information in an instance-contrast way, and a projector head with a newly-designed categorical contrastive learning loss to fully exploit the intrinsic structure of data in a category-consistent way. Comprehensive experiments confirm that Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins (e.g. 10.7\% absolute rise in accuracy on CUB in low-data regime). △ Less

Submitted 11 November, 2020; originally announced November 2020.

arXiv:2009.13683 [pdf]

doi 10.1109/tbme.2021.3070477

Real-time video streaming in vivo using ultrasound as the communication channel

Authors: Zhengchang Kou, Rita J. Miller, Andrew C. Singer, Michael L. Oelze

Abstract: The emergence of capsule endoscopy has provided a means of capturing video of the small intestines without having to resort to an invasive procedure involving intubation. However, real-time video streaming to a receiver outside the body remains challenging for capsule endoscopy. Traditional electromagnetic-based solutions are limited in their data rates and available power. Recently, ultrasound wa… ▽ More The emergence of capsule endoscopy has provided a means of capturing video of the small intestines without having to resort to an invasive procedure involving intubation. However, real-time video streaming to a receiver outside the body remains challenging for capsule endoscopy. Traditional electromagnetic-based solutions are limited in their data rates and available power. Recently, ultrasound was investigated as a communication channel for through-tissue data transmission. To achieve real-time video streaming through tissue, data rates of ultrasound need to exceed 1 Mbps. In a previous study, we demonstrated ultrasound communications with data rates greater than 30 Mbps with two focused ultrasound transducers using a large footprint laboratory system through slabs of lossy tissues [1]. While the form factor of the transmitter is also crucial for capsule endoscopy, it is obvious that a large, focused transducer cannot fit within the size of a capsule. Several other challenges for achieving high-speed ultrasonic communication through tissue include strong reflections leading to multipath effects and attenuation. In this work, we demonstrate ultrasonic video communications using a mm-scale microcrystal transmitter with video streaming supplied by a camera connected to a Field Programmable Gate Array (FPGA). The signals were transmitted through a tissue-mimicking phantom and through the abdomen of a rabbit in vivo. The ultrasound signal was recorded by an array probe connected to a Verasonics Vantage system and decoded back to video. To improve the received signal quality, we combined the signal from multiple channels of the array probe. Orthogonal frequency division multiplexing (OFDM) modulation was used to reduce the receiver complexity under a strong multipath environment. △ Less

Submitted 28 September, 2020; originally announced September 2020.

arXiv:2007.08547 [pdf, other]

Talking-head Generation with Rhythmic Head Motion

Authors: Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, Chenliang Xu

Abstract: When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information. However, generating a lip-synced video while moving head naturally is challenging. While remarkably successful, existing works either generate still talkingface videos or rely on landmark/video frames as sparse/dense map** guidance to generate head movements, which leads to unreal… ▽ More When people deliver a speech, they naturally move heads, and this rhythmic head motion conveys prosodic information. However, generating a lip-synced video while moving head naturally is challenging. While remarkably successful, existing works either generate still talkingface videos or rely on landmark/video frames as sparse/dense map** guidance to generate head movements, which leads to unrealistic or uncontrollable video synthesis. To overcome the limitations, we propose a 3D-aware generative network along with a hybrid embedding module and a non-linear composition module. Through modeling the head motion and facial expressions1 explicitly, manipulating 3D animation carefully, and embedding reference images dynamically, our approach achieves controllable, photo-realistic, and temporally coherent talking-head videos with natural head movements. Thoughtful experiments on several standard benchmarks demonstrate that our method achieves significantly better results than the state-of-the-art methods in both quantitative and qualitative comparisons. The code is available on https://github.com/ lelechen63/Talking-head-Generation-with-Rhythmic-Head-Motion. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:2006.05905 [pdf, other]

Spatial-Temporal Dynamic Graph Attention Networks for Ride-hailing Demand Prediction

Authors: Weiguo Pian, Yingbo Wu, Xiangmou Qu, Junpeng Cai, Ziyi Kou

Abstract: Ride-hailing demand prediction is an essential task in spatial-temporal data mining. Accurate Ride-hailing demand prediction can help to pre-allocate resources, improve vehicle utilization and user experiences. Graph Convolutional Networks (GCN) is commonly used to model the complicated irregular non-Euclidean spatial correlations. However, existing GCN-based ride-hailing demand prediction methods… ▽ More Ride-hailing demand prediction is an essential task in spatial-temporal data mining. Accurate Ride-hailing demand prediction can help to pre-allocate resources, improve vehicle utilization and user experiences. Graph Convolutional Networks (GCN) is commonly used to model the complicated irregular non-Euclidean spatial correlations. However, existing GCN-based ride-hailing demand prediction methods only assign the same importance to different neighbor regions, and maintain a fixed graph structure with static spatial relationships throughout the timeline when extracting the irregular non-Euclidean spatial correlations. In this paper, we propose the Spatial-Temporal Dynamic Graph Attention Network (STDGAT), a novel ride-hailing demand prediction method. Based on the attention mechanism of GAT, STDGAT extracts different pair-wise correlations to achieve the adaptive importance allocation for different neighbor regions. Moreover, in STDGAT, we design a novel time-specific commuting-based graph attention mode to construct a dynamic graph structure for capturing the dynamic time-specific spatial relationships throughout the timeline. Extensive experiments are conducted on a real-world ride-hailing demand dataset, and the experimental results demonstrate the significant improvement of our method on three evaluation metrics RMSE, MAPE and MAE over state-of-the-art baselines. △ Less

Submitted 16 April, 2022; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: 11 pages, 6 figures. arXiv admin note: text overlap with arXiv:2006.04089

arXiv:2006.04089 [pdf, other]

STDI-Net: Spatial-Temporal Network with Dynamic Interval Map** for Bike Sharing Demand Prediction

Authors: Weiguo Pian, Yingbo Wu, Ziyi Kou

Abstract: As an economical and healthy mode of shared transportation, Bike Sharing System (BSS) develops quickly in many big cities. An accurate prediction method can help BSS schedule resources in advance to meet the demands of users, and definitely improve operating efficiencies of it. However, most of the existing methods for similar tasks just utilize spatial or temporal information independently. Thoug… ▽ More As an economical and healthy mode of shared transportation, Bike Sharing System (BSS) develops quickly in many big cities. An accurate prediction method can help BSS schedule resources in advance to meet the demands of users, and definitely improve operating efficiencies of it. However, most of the existing methods for similar tasks just utilize spatial or temporal information independently. Though there are some methods consider both, they only focus on demand prediction in a single location or between location pairs. In this paper, we propose a novel deep learning method called Spatial-Temporal Dynamic Interval Network (STDI-Net). The method predicts the number of renting and returning orders of multiple connected stations in the near future by modeling joint spatial-temporal information. Furthermore, we embed an additional module that generates dynamical learnable map**s for different time intervals, to include the factor that different time intervals have a strong influence on demand prediction in BSS. Extensive experiments are conducted on the NYC Bike dataset, the results demonstrate the superiority of our method over existing methods. △ Less

Submitted 28 December, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: accepted by CIKM workshops 2020

arXiv:2005.03201 [pdf, other]

What comprises a good talking-head video generation?: A Survey and Benchmark

Authors: Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, Chenliang Xu

Abstract: Over the years, performance evaluation has become essential in computer vision, enabling tangible progress in many sub-fields. While talking-head video generation has become an emerging research topic, existing evaluations on this topic present many limitations. For example, most approaches use human subjects (e.g., via Amazon MTurk) to evaluate their research claims directly. This subjective eval… ▽ More Over the years, performance evaluation has become essential in computer vision, enabling tangible progress in many sub-fields. While talking-head video generation has become an emerging research topic, existing evaluations on this topic present many limitations. For example, most approaches use human subjects (e.g., via Amazon MTurk) to evaluate their research claims directly. This subjective evaluation is cumbersome, unreproducible, and may impend the evolution of new research. In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies. As for evaluation, we either propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video, namely, identity preserving, lip synchronization, high video quality, and natural-spontaneous motion. By conducting a thoughtful analysis across several state-of-the-art talking-head generation approaches, we aim to uncover the merits and drawbacks of current methods and point out promising directions for future work. All the evaluation code is available at: https://github.com/lelechen63/talking-head-generation-survey. △ Less

Submitted 6 May, 2020; originally announced May 2020.

arXiv:1911.07160 [pdf, other]

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation

Authors: Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, Chenliang Xu

Abstract: Weakly Supervised Object Localization (WSOL) methods generate both classification and localization results by learning from only image category labels. Previous methods usually utilize class activation map (CAM) to obtain target object regions. However, most of them only focus on improving foreground object parts in CAM, but ignore the important effect of its background contents. In this paper, we… ▽ More Weakly Supervised Object Localization (WSOL) methods generate both classification and localization results by learning from only image category labels. Previous methods usually utilize class activation map (CAM) to obtain target object regions. However, most of them only focus on improving foreground object parts in CAM, but ignore the important effect of its background contents. In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters. The generated sample-specific confidence mask is able to indicate the extent of determination for each pixel in CAM, and further supervises additional CAM extended from internal feature maps. Besides, we introduce Co-supervised Augmentation (CoAug) module to capture feature-level representation for foreground and background parts in CAM separately. Then a metric loss is applied at batch sample level to augment distinguish ability of our model, which helps a lot to localize more related object parts. Our final model, CSoA, combines the two modules and achieves superior performance, e.g. $37.69\%$ and $48.81\%$ Top-1 localization error on CUB-200 and ILSVRC datasets, respectively, which outperforms all previous methods and becomes the new state-of-the-art. △ Less

Submitted 13 January, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: Accepted by WACV2021. Equal contribution for the first two authors

arXiv:1909.03619 [pdf, other]

Weakly Supervised Localization Using Background Images

Authors: Ziyi Kou, Wentian Zhao, Guofeng Cui, Shaojie Wang

Abstract: Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels. How-ever, these networks always highlight the most discriminativeparts to perform the task, the located areas are much smallerthan entire targeted objects. In this work, we propose a novelend-to-end model to enlarge CAMs generated from… ▽ More Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels. How-ever, these networks always highlight the most discriminativeparts to perform the task, the located areas are much smallerthan entire targeted objects. In this work, we propose a novelend-to-end model to enlarge CAMs generated from classifi-cation models, which can localize targeted objects more pre-cisely. In detail, we add an additional module in traditionalclassification networks to extract foreground object propos-als from images without classifying them into specific cate-gories. Then we set these normalized regions as unrestrictedpixel-level mask supervision for the following classificationtask. We collect a set of images defined as Background ImageSet from the Internet. The number of them is much smallerthan the targeted dataset but surprisingly well supports themethod to extract foreground regions from different pictures.The region extracted is independent from classification task,where the extracted region in each image covers almost en-tire object rather than just a significant part. Therefore, theseregions can serve as masks to supervise the response mapgenerated from classification models to become larger andmore precise. The method achieves state-of-the-art results onCUB-200-2011 in terms of Top-1 and Top-5 localization er-ror while has a competitive result on ILSVRC2016 comparedwith other approaches. △ Less

Submitted 10 September, 2019; v1 submitted 8 September, 2019; originally announced September 2019.

Comments: Course project of CSC577, University of Rochester

arXiv:1812.00344 [pdf, other]

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

Authors: Shaojie Wang, Wentian Zhao, Ziyi Kou, Chenliang Xu

Abstract: Understanding web instructional videos is an essential branch of video understanding in two aspects. First, most existing video methods focus on short-term actions for a-few-second-long video clips; these methods are not directly applicable to long videos. Second, unlike unconstrained long videos, e.g., movies, instructional videos are more structured in that they have step-by-step procedure const… ▽ More Understanding web instructional videos is an essential branch of video understanding in two aspects. First, most existing video methods focus on short-term actions for a-few-second-long video clips; these methods are not directly applicable to long videos. Second, unlike unconstrained long videos, e.g., movies, instructional videos are more structured in that they have step-by-step procedure constraining the understanding task. In this paper, we study reasoning on instructional videos via question-answering (QA). Surprisingly, it has not been an emphasis in the video community despite its rich applications. We thereby introduce YouQuek, an annotated QA dataset for instructional videos based on the recent YouCook2. The questions in YouQuek are not limited to cues on one frame but related to logical reasoning in the temporal dimension. Observing the lack of effective representations for modeling long videos, we propose a set of carefully designed models including a novel Recurrent Graph Convolutional Network (RGCN) that captures both temporal order and relation information. Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding. Extensive experiments on YouQuek suggest that RGCN performs the best in terms of QA accuracy and a better performance is gained by introducing human annotated description. △ Less

Submitted 6 December, 2018; v1 submitted 2 December, 2018; originally announced December 2018.

arXiv:1712.06433 [pdf, ps, other]

Filtered Hyperbolic Moment Method for the Vlasov Equation

Authors: Yana Di, Yuwei Fan, Zhenzhong Kou, Ruo Li, Yanli Wang

Abstract: In this paper, we investigate the effect of the filter for the hyperbolic moment equations(HME) [15] of the Vlasov-Poisson equations and propose a novel quasi time-consistent filter to suppress the numerical recurrence effect. By taking properties of HME into consideration, the filter preserves a lot of physical properties of HME, including Galilean invariance and the conservation of mass, momentu… ▽ More In this paper, we investigate the effect of the filter for the hyperbolic moment equations(HME) [15] of the Vlasov-Poisson equations and propose a novel quasi time-consistent filter to suppress the numerical recurrence effect. By taking properties of HME into consideration, the filter preserves a lot of physical properties of HME, including Galilean invariance and the conservation of mass, momentum and energy. We present two viewpoints, collisional viewpoint and dissipative viewpoint, to dissect the filter, and show that the filtered hyperbolic moment method can be treated as a solver of Vlasov equation. Numerical simulations of the linear Landau dam** and two stream instability are tested to demonstrate the effectiveness of the filter in restraining recurrence arising from particle streaming. Both the analysis and the numerical results indicate that the filtered HME can capture the evolution of the Vlasov equation, even when phase mixing and filamentation are dominant. △ Less

Submitted 19 November, 2018; v1 submitted 14 December, 2017; originally announced December 2017.

arXiv:1001.4597

Learning to Blend by Relevance

Authors: Jiang Chen, Wei Chu, Zhenzhen Kou, Zhaohui Zheng

Abstract: Emergence of various vertical search engines highlights the fact that a single ranking technology cannot deal with the complexity and scale of search problems. For example, technology behind video and image search is very different from general web search. Their ranking functions share few features. Question answering websites (e.g., Yahoo! Answer) can make use of text matching and click feature… ▽ More Emergence of various vertical search engines highlights the fact that a single ranking technology cannot deal with the complexity and scale of search problems. For example, technology behind video and image search is very different from general web search. Their ranking functions share few features. Question answering websites (e.g., Yahoo! Answer) can make use of text matching and click features developed for general web, but they have unique page structures and rich user feedback, e.g., thumbs up and thumbs down ratings in Yahoo! answer, which greatly benefit their own ranking. Even for those features shared by answer and general web, the correlation between features and relevance could be very different. Therefore, dedicated functions are needed in order to better rank documents within individual domains. These dedicated functions are defined on distinct feature spaces. However, having one search box for each domain, is neither efficient nor scalable. Rather than ty** the same query two times into both Yahoo! Search and Yahoo! Answer and retrieving two ranking lists, we would prefer putting it only once but receiving a comprehensive list of documents from both domains on the subject. This situation calls for new technology that blends documents from different sources into a single ranking list. Despite the content richness of the blended list, it has to be sorted by relevance none the less. We call such technology blending, which is the main subject of this paper. △ Less

Submitted 23 September, 2010; v1 submitted 26 January, 2010; originally announced January 2010.

Comments: This paper has been withdrawn by the author due to conflict with a patent filed by the author

arXiv:cond-mat/0307691 [pdf, ps, other]

doi 10.1063/1.1605232

Mössbauer Effect Probe of Local Jahn-Teller distortion in Fe-doped Colossal Magnetoresistive Manganites

Authors: Zhao-hua Cheng, Zhi-hong Wang, Nai-li Di, Zhi-qi Kou, Guang-jun Wang, Rui-wei Li, Yi Lu, Qing-an Li, Bao-gen Shen, R. A. Dunlap

Abstract: Local structure of the Fe-doped La$_{1-x}$Ca$_{x}$MnO$_{3}$ (x=0.00-1.00) compounds has been investigated by means of Mössbauer spectroscopy. $^{57}$Fe Mössbauer spectra provide a direct evidence of Jahn-Teller distortion in these manganites. On the basis of Mössbauer results, the Jahn-Teller coupling was estimated. It is noteworthy that Ca-concentration dependence of Jahn-Teller coupling streng… ▽ More Local structure of the Fe-doped La$_{1-x}$Ca$_{x}$MnO$_{3}$ (x=0.00-1.00) compounds has been investigated by means of Mössbauer spectroscopy. $^{57}$Fe Mössbauer spectra provide a direct evidence of Jahn-Teller distortion in these manganites. On the basis of Mössbauer results, the Jahn-Teller coupling was estimated. It is noteworthy that Ca-concentration dependence of Jahn-Teller coupling strength is very consistent with the magnetic phase diagram. Our results reveal that Mössbauer spectroscopy can not only detect the local structural distortion, but also provide a technique to investigate Jahn-Teller coupling of Fe-doped La$_{1-x}$Ca$_{x}$MnO$_{3}$ colossal magnetoresistive perovskites. △ Less

Submitted 28 July, 2003; originally announced July 2003.

Comments: 3 figures, will appear in Applied Physics Letters

Showing 1–37 of 37 results for author: Kou, Z