Search | arXiv e-print repository

arXiv:2406.19633 [pdf, other]

Combating Missed Recalls in E-commerce Search: A CoT-Prompting Testing Approach

Authors: Shengnan Wu, Yongxiang Hu, Yingchuan Wang, Jiazhen Gu, ** Meng, Liujie Fan, Zhongshi Luan, Xin Wang, Yangfan Zhou

Abstract: Search components in e-commerce apps, often complex AI-based systems, are prone to bugs that can lead to missed recalls - situations where items that should be listed in search results aren't. This can frustrate shop owners and harm the app's profitability. However, testing for missed recalls is challenging due to difficulties in generating user-aligned test cases and the absence of oracles. In th… ▽ More Search components in e-commerce apps, often complex AI-based systems, are prone to bugs that can lead to missed recalls - situations where items that should be listed in search results aren't. This can frustrate shop owners and harm the app's profitability. However, testing for missed recalls is challenging due to difficulties in generating user-aligned test cases and the absence of oracles. In this paper, we introduce mrDetector, the first automatic testing approach specifically for missed recalls. To tackle the test case generation challenge, we use findings from how users construct queries during searching to create a CoT prompt to generate user-aligned queries by LLM. In addition, we learn from users who create multiple queries for one shop and compare search results, and provide a test oracle through a metamorphic relation. Extensive experiments using open access data demonstrate that mrDetector outperforms all baselines with the lowest false positive ratio. Experiments with real industrial data show that mrDetector discovers over one hundred missed recalls with only 17 false positives. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE Companion '24), July 15--19, 2024, Porto de Galinhas, Brazil

arXiv:2406.16986 [pdf, ps, other]

Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

Authors: Tao Huang, Ziyang Chen, Jiayang Meng, Qingyu Huang, Xu Yang, Xun Yi, Ibrahim Khalil

Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce… ▽ More In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce the effectiveness of unlearning. Addressing these limitations, we introduce Mini-Unlearning, a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction map**. Our method, Mini-Unlearning, utilizes a minimal subset of historical gradients and leverages this contraction map** to facilitate scalable, efficient unlearning. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security, offering a promising solution for applications requiring robust unlearning capabilities. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.14878 [pdf, other]

MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

Authors: Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Zi Huang, Yadan Luo

Abstract: LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assem… ▽ More LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assembles best-fit supermodels for each test batch from a bank of historical checkpoints, leveraging long-term knowledge to guide model updates without forgetting. The model assembly is directed by the proposed synergy weights (SW), employed for weighted averaging of the selected checkpoints to minimize redundancy in the composite supermodel. These weights are calculated by evaluating the similarity of predicted bounding boxes on test data and the feature independence among model pairs in the bank. To maintain an informative yet compact model bank, we pop out checkpoints with the lowest average SW scores and insert newly updated model weights. Our method was rigorously tested against prior test-time domain adaptation strategies on three datasets and under eight types of corruptions, demonstrating its superior adaptability to changing scenes and conditions. Remarkably, our approach achieved a 67.3% increase in performance in a complex "cross-corruption" scenario, which involves cross-dataset inconsistencies and real-world scene corruptions, providing a more realistic testbed of adaptation capabilities. The code is available at https://github.com/zhuoxiao-chen/MOS. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.11389 [pdf, other]

SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods such as a GNNExplainer. However, post-hoc explanations can not facilitate the model predictions and the computational cost of these methods cannot meet practical requirements, thus limiting their application in real-world scenarios. To address these issues, we propose SEFraud, a novel graph-based self-explainable fraud detection framework that simultaneously tackles fraud detection and result in interpretability. Concretely, SEFraud first leverages customized heterogeneous graph transformer networks with learnable feature masks and edge masks to learn expressive representations from the informative heterogeneously typed transactions. A new triplet loss is further designed to enhance the performance of mask learning. Empirical results on various datasets demonstrate the effectiveness of SEFraud as it shows considerable advantages in both the fraud detection performance and interpretability of prediction results. Moreover, SEFraud has been deployed and offers explainable fraud detection service for the largest bank in China, Industrial and Commercial Bank of China Limited (ICBC). Results collected from the production environment of ICBC show that SEFraud can provide accurate detection results and comprehensive explanations that align with the expert business understanding, confirming its efficiency and applicability in large-scale online services. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by KDD 2024

arXiv:2406.09484 [pdf, other]

Is Diffusion Model Safe? Severe Data Leakage via Gradient-Guided Diffusion Model

Authors: Jiayang Meng, Tao Huang, Hong Chen, Cui** Li

Abstract: Gradient leakage has been identified as a potential source of privacy breaches in modern image processing systems, where the adversary can completely reconstruct the training images from leaked gradients. However, existing methods are restricted to reconstructing low-resolution images where data leakage risks of image processing systems are not sufficiently explored. In this paper, by exploiting d… ▽ More Gradient leakage has been identified as a potential source of privacy breaches in modern image processing systems, where the adversary can completely reconstruct the training images from leaked gradients. However, existing methods are restricted to reconstructing low-resolution images where data leakage risks of image processing systems are not sufficiently explored. In this paper, by exploiting diffusion models, we propose an innovative gradient-guided fine-tuning method and introduce a new reconstruction attack that is capable of stealing private, high-resolution images from image processing systems through leaked gradients where severe data leakage encounters. Our attack method is easy to implement and requires little prior knowledge. The experimental results indicate that current reconstruction attacks can steal images only up to a resolution of $128 \times 128$ pixels, while our attack method can successfully recover and steal images with resolutions up to $512 \times 512$ pixels. Our attack method significantly outperforms the SOTA attack baselines in terms of both pixel-wise accuracy and time efficiency of image reconstruction. Furthermore, our attack can render differential privacy ineffective to some extent. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08877 [pdf, other]

EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding

Authors: Yuan-Ming Li, Wei-** Huang, An-Lan Wang, Ling-An Zeng, **g-Ke Meng, Wei-Shi Zheng

Abstract: We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal bound… ▽ More We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 33 pages, 9 figures

arXiv:2406.02622 [pdf, other]

Safeguarding Large Language Models: A Survey

Authors: Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie **, Yi Qi, **wei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang

Abstract: In the burgeoning field of Large Language Models (LLMs), develo** a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced int… ▽ More In the burgeoning field of Large Language Models (LLMs), develo** a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: under review. arXiv admin note: text overlap with arXiv:2402.01822

arXiv:2406.02058 [pdf, other]

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

Authors: Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, **gdong Wang, Jian Zhang

Abstract: This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations.… ▽ More This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method. Project page: https://3d-aigc.github.io/OpenGaussian △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: technical report, 15 pages

arXiv:2405.20669 [pdf, other]

Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation

Authors: Shuzhou Yang, Yu Wang, Haijie Li, Jiarui Meng, Xiandong Meng, Jian Zhang

Abstract: Single image-to-3D generation is pivotal for crafting controllable 3D assets. Given its underconstrained nature, we leverage geometric priors from a 3D novel view generation diffusion model and appearance priors from a 2D image generation method to guide the optimization process. We note that a disparity exists between the training datasets of 2D and 3D diffusion models, leading to their outputs s… ▽ More Single image-to-3D generation is pivotal for crafting controllable 3D assets. Given its underconstrained nature, we leverage geometric priors from a 3D novel view generation diffusion model and appearance priors from a 2D image generation method to guide the optimization process. We note that a disparity exists between the training datasets of 2D and 3D diffusion models, leading to their outputs showing marked differences in appearance. Specifically, 2D models tend to deliver more detailed visuals, whereas 3D models produce consistent yet over-smooth results across different views. Hence, we optimize a set of 3D Gaussians using 3D priors in spatial domain to ensure geometric consistency, while exploiting 2D priors in the frequency domain through Fourier transform for higher visual quality. This 2D-3D hybrid Fourier Score Distillation objective function (dubbed hy-FSD), can be integrated into existing 3D generation methods, yielding significant performance improvements. With this technique, we further develop an image-to-3D generation pipeline to create high-quality 3D objects within one minute, named Fourier123. Extensive experiments demonstrate that Fourier123 excels in efficient generation with rapid convergence speed and visual-friendly generation results. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.15118 [pdf, other]

GS-Hider: Hiding Messages into 3D Gaussian Splatting

Authors: Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang

Abstract: 3D Gaussian Splatting (3DGS) has already become the emerging research focus in the fields of 3D scene reconstruction and novel view synthesis. Given that training a 3DGS requires a significant amount of time and computational cost, it is crucial to protect the copyright, integrity, and privacy of such 3D assets. Steganography, as a crucial technique for encrypted transmission and copyright protect… ▽ More 3D Gaussian Splatting (3DGS) has already become the emerging research focus in the fields of 3D scene reconstruction and novel view synthesis. Given that training a 3DGS requires a significant amount of time and computational cost, it is crucial to protect the copyright, integrity, and privacy of such 3D assets. Steganography, as a crucial technique for encrypted transmission and copyright protection, has been extensively studied. However, it still lacks profound exploration targeted at 3DGS. Unlike its predecessor NeRF, 3DGS possesses two distinct features: 1) explicit 3D representation; and 2) real-time rendering speeds. These characteristics result in the 3DGS point cloud files being public and transparent, with each Gaussian point having a clear physical significance. Therefore, ensuring the security and fidelity of the original 3D scene while embedding information into the 3DGS point cloud files is an extremely challenging task. To solve the above-mentioned issue, we first propose a steganography framework for 3DGS, dubbed GS-Hider, which can embed 3D scenes and images into original GS point clouds in an invisible manner and accurately extract the hidden messages. Specifically, we design a coupled secured feature attribute to replace the original 3DGS's spherical harmonics coefficients and then use a scene decoder and a message decoder to disentangle the original RGB scene and the hidden message. Extensive experiments demonstrated that the proposed GS-Hider can effectively conceal multimodal messages without compromising rendering quality and possesses exceptional security, robustness, capacity, and flexibility. Our project is available at: https://xuanyuzhang21.github.io/project/gshider. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 3DGS steganography

arXiv:2405.06782 [pdf, other]

GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs

Authors: Mingyu Liu, Ekim Yurtsever, Marc Brede, Jun Meng, Walter Zimmer, Xingcheng Zhou, Bare Luka Zagar, Yuning Cui, Alois Knoll

Abstract: Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object r… ▽ More Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.01775 [pdf, other]

Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design

Authors: Jian Meng, Yuan Liao, Anupreetham Anupreetham, Ahmed Hasssan, Shixing Yu, Han-sok Suh, Xiaofeng Hu, Jae-sun Seo

Abstract: The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design com… ▽ More The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design community. First, although the state-of-the-art quantization algorithm can achieve low precision with negligible degradation of accuracy, the latest deep learning framework (e.g., PyTorch) can only support non-customizable 8-bit precision, data format, and parameter extraction. Secondly, the objective of quantization is to enable the computation with low-precision data. However, the current SoTA algorithm treats the quantized integer as an intermediate result, while the final output of the quantizer is the "discretized" floating-point values, ignoring the practical needs and adding additional workload to hardware designers for integer parameter extraction and layer fusion. Finally, the compression toolkits designed by the industry are constrained to their in-house product or a handful of algorithms. The limited degree of freedom in the current toolkit and the under-explored customization hinder the prototype ASIC or FPGA-based accelerator design. To resolve these challenges, we propose Torch2Chip, an open-sourced, fully customizable, and high-performance toolkit that supports user-designed compression followed by automatic model fusion and parameter extraction. Torch2Chip incorporates the hierarchical design workflow, and the user-customized compression algorithm will be directly packed into the deployment-ready format for prototype chip verification with either CNN or vision transformer (ViT). The code is available at https://github.com/SeoLabCornell/torch2chip. △ Less

Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted for publication at MLSys 2024

arXiv:2404.01168 [pdf, other]

Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting

Authors: Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma

Abstract: 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that phy… ▽ More 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. Our code will be made publicly available for reproducible research. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 22 pages, 7 figures

arXiv:2402.01822 [pdf, ps, other]

Building Guardrails for Large Language Models

Authors: Yi Dong, Ronghui Mu, Gaojie **, Yi Qi, **wei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, Xiaowei Huang

Abstract: As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard,… ▽ More As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and develo** verification and testing to ensure the utmost quality of the final product. △ Less

Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

arXiv:2401.11644 [pdf, other]

Friends Across Time: Multi-Scale Action Segmentation Transformer for Surgical Phase Recognition

Authors: Bokai Zhang, Jiayuan Meng, Bin Cheng, Dean Biskup, Svetlana Petculescu, Angela Chapman

Abstract: Automatic surgical phase recognition is a core technology for modern operating rooms and online surgical video assessment platforms. Current state-of-the-art methods use both spatial and temporal information to tackle the surgical phase recognition task. Building on this idea, we propose the Multi-Scale Action Segmentation Transformer (MS-AST) for offline surgical phase recognition and the Multi-S… ▽ More Automatic surgical phase recognition is a core technology for modern operating rooms and online surgical video assessment platforms. Current state-of-the-art methods use both spatial and temporal information to tackle the surgical phase recognition task. Building on this idea, we propose the Multi-Scale Action Segmentation Transformer (MS-AST) for offline surgical phase recognition and the Multi-Scale Action Segmentation Causal Transformer (MS-ASCT) for online surgical phase recognition. We use ResNet50 or EfficientNetV2-M for spatial feature extraction. Our MS-AST and MS-ASCT can model temporal information at different scales with multi-scale temporal self-attention and multi-scale temporal cross-attention, which enhances the capture of temporal relationships between frames and segments. We demonstrate that our method can achieve 95.26% and 96.15% accuracy on the Cholec80 dataset for online and offline surgical phase recognition, respectively, which achieves new state-of-the-art results. Our method can also achieve state-of-the-art results on non-medical datasets in the video action segmentation domain. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2312.01745 [pdf, other]

Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval

Authors: Dixuan Lin, Yixing Peng, **gke Meng, Wei-Shi Zheng

Abstract: Text-to-image person re-identification (ReID) aims to retrieve images of a person based on a given textual description. The key challenge is to learn the relations between detailed information from visual and textual modalities. Existing works focus on learning a latent space to narrow the modality gap and further build local correspondences between two modalities. However, these methods assume th… ▽ More Text-to-image person re-identification (ReID) aims to retrieve images of a person based on a given textual description. The key challenge is to learn the relations between detailed information from visual and textual modalities. Existing works focus on learning a latent space to narrow the modality gap and further build local correspondences between two modalities. However, these methods assume that image-to-text and text-to-image associations are modality-agnostic, resulting in suboptimal associations. In this work, we show the discrepancy between image-to-text association and text-to-image association and propose CADA: Cross-Modal Adaptive Dual Association that finely builds bidirectional image-text detailed associations. Our approach features a decoder-based adaptive dual association module that enables full interaction between visual and textual modalities, allowing for bidirectional and adaptive cross-modal correspondence associations. Specifically, the paper proposes a bidirectional association mechanism: Association of text Tokens to image Patches (ATP) and Association of image Regions to text Attributes (ARA). We adaptively model the ATP based on the fact that aggregating cross-modal features based on mistaken associations will lead to feature distortion. For modeling the ARA, since the attributes are typically the first distinguishing cues of a person, we propose to explore the attribute-level association by predicting the masked text phrase using the related image region. Finally, we learn the dual associations between texts and images, and the experimental results demonstrate the superiority of our dual formulation. Codes will be made publicly available. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.07634 [pdf, other]

ActiveDC: Distribution Calibration for Active Finetuning

Authors: Wenshuai Xu, Zhenghui Hu, Yu Lu, **zhou Meng, Qingjie Liu, Yunhong Wang

Abstract: The pretraining-finetuning paradigm has gained popularity in various computer vision tasks. In this paradigm, the emergence of active finetuning arises due to the abundance of large-scale data and costly annotation requirements. Active finetuning involves selecting a subset of data from an unlabeled pool for annotation, facilitating subsequent finetuning. However, the use of a limited number of tr… ▽ More The pretraining-finetuning paradigm has gained popularity in various computer vision tasks. In this paradigm, the emergence of active finetuning arises due to the abundance of large-scale data and costly annotation requirements. Active finetuning involves selecting a subset of data from an unlabeled pool for annotation, facilitating subsequent finetuning. However, the use of a limited number of training samples can lead to a biased distribution, potentially resulting in model overfitting. In this paper, we propose a new method called ActiveDC for the active finetuning tasks. Firstly, we select samples for annotation by optimizing the distribution similarity between the subset to be selected and the entire unlabeled pool in continuous space. Secondly, we calibrate the distribution of the selected samples by exploiting implicit category information in the unlabeled pool. The feature visualization provides an intuitive sense of the effectiveness of our approach to distribution calibration. We conducted extensive experiments on three image classification datasets with different sampling ratios. The results indicate that ActiveDC consistently outperforms the baseline performance in all image classification tasks. The improvement is particularly significant when the sampling ratio is low, with performance gains of up to 10%. Our code will be released. △ Less

Submitted 27 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: CVPR 2024 Accept

arXiv:2310.13028 [pdf, other]

Reliable Academic Conference Question Answering: A Study Based on Large Language Model

Authors: Zhiwei Huang, Long **, Junjie Wang, Mingchen Tu, Yin Hua, Zhiqiang Liu, Jiawei Meng, Huajun Chen, Wen Zhang

Abstract: The rapid growth of computer science has led to a proliferation of research presented at academic conferences, fostering global scholarly communication. Researchers consistently seek accurate, current information about these events at all stages. This data surge necessitates an intelligent question-answering system to efficiently address researchers' queries and ensure awareness of the latest adva… ▽ More The rapid growth of computer science has led to a proliferation of research presented at academic conferences, fostering global scholarly communication. Researchers consistently seek accurate, current information about these events at all stages. This data surge necessitates an intelligent question-answering system to efficiently address researchers' queries and ensure awareness of the latest advancements. The information of conferences is usually published on their official website, organized in a semi-structured way with a lot of text. To address this need, we have developed the ConferenceQA dataset for 7 diverse academic conferences with human annotations. Firstly, we employ a combination of manual and automated methods to organize academic conference data in a semi-structured JSON format. Subsequently, we annotate nearly 100 question-answer pairs for each conference. Each pair is classified into four different dimensions. To ensure the reliability of the data, we manually annotate the source of each answer. In light of recent advancements, Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks. They have demonstrated impressive capabilities in information-seeking question answering after instruction fine-tuning, and as such, we present our conference QA study based on LLM. Due to hallucination and outdated knowledge of LLMs, we adopt retrieval based methods to enhance LLMs' question-answering abilities. We have proposed a structure-aware retrieval method, specifically designed to leverage inherent structural information during the retrieval process. Empirical validation on the ConferenceQA dataset has demonstrated the effectiveness of this method. The dataset and code are readily accessible on https://github.com/zjukg/ConferenceQA. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: 10 pages, 4 figures, 2 tables

arXiv:2309.17105 [pdf, other]

Continual Action Assessment via Task-Consistent Score-Discriminative Feature Distribution Modeling

Authors: Yuan-Ming Li, Ling-An Zeng, **g-Ke Meng, Wei-Shi Zheng

Abstract: Action Quality Assessment (AQA) is a task that tries to answer how well an action is carried out. While remarkable progress has been achieved, existing works on AQA assume that all the training data are visible for training at one time, but do not enable continual learning on assessing new technical actions. In this work, we address such a Continual Learning problem in AQA (Continual-AQA), which u… ▽ More Action Quality Assessment (AQA) is a task that tries to answer how well an action is carried out. While remarkable progress has been achieved, existing works on AQA assume that all the training data are visible for training at one time, but do not enable continual learning on assessing new technical actions. In this work, we address such a Continual Learning problem in AQA (Continual-AQA), which urges a unified model to learn AQA tasks sequentially without forgetting. Our idea for modeling Continual-AQA is to sequentially learn a task-consistent score-discriminative feature distribution, in which the latent features express a strong correlation with the score labels regardless of the task or action types.From this perspective, we aim to mitigate the forgetting in Continual-AQA from two aspects. Firstly, to fuse the features of new and previous data into a score-discriminative distribution, a novel Feature-Score Correlation-Aware Rehearsal is proposed to store and reuse data from previous tasks with limited memory size. Secondly, an Action General-Specific Graph is developed to learn and decouple the action-general and action-specific knowledge so that the task-consistent score-discriminative features can be better extracted across various tasks. Extensive experiments are conducted to evaluate the contributions of proposed components. The comparisons with the existing continual learning methods additionally verify the effectiveness and versatility of our approach. Data and code are available at https://github.com/iSEE-Laboratory/Continual-AQA. △ Less

Submitted 2 May, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

Comments: 16 pages, 8 figures

arXiv:2309.08854 [pdf, other]

Intention-Aware Planner for Robust and Safe Aerial Tracking

Authors: Qiuyu Ren, Huan Yu, Jiajun Dai, Zhi Zheng, Jun Meng, Li Xu, Chao Xu, Fei Gao, Yanjun Cao

Abstract: Autonomous target tracking with quadrotors has wide applications in many scenarios, such as cinematographic follow-up shooting or suspect chasing. Target motion prediction is necessary when designing the tracking planner. However, the widely used constant velocity or constant rotation assumption can not fully capture the dynamics of the target. The tracker may fail when the target happens to move… ▽ More Autonomous target tracking with quadrotors has wide applications in many scenarios, such as cinematographic follow-up shooting or suspect chasing. Target motion prediction is necessary when designing the tracking planner. However, the widely used constant velocity or constant rotation assumption can not fully capture the dynamics of the target. The tracker may fail when the target happens to move aggressively, such as sudden turn or deceleration. In this paper, we propose an intention-aware planner by additionally considering the intention of the target to enhance safety and robustness in aerial tracking applications. Firstly, a designated intention prediction method is proposed, which combines a user-defined potential assessment function and a state observation function. A reachable region is generated to specifically evaluate the turning intentions. Then we design an intention-driven hybrid A* method to predict the future possible positions for the target. Finally, an intention-aware optimization approach is designed to generate a spatial-temporal optimal trajectory, allowing the tracker to perceive unexpected situations from the target. Benchmark comparisons and real-world experiments are conducted to validate the performance of our method. △ Less

Submitted 20 March, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 8 pages, 10 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2308.15989 [pdf, other]

DiffuVolume: Diffusion Model for Volume based Stereo Matching

Authors: Dian Zheng, Xiao-Ming Wu, Zuhao Liu, **gke Meng, Wei-shi Zheng

Abstract: Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the… ▽ More Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the diffusion model to stereo matching. Our method, termed DiffuVolume, considers the diffusion model as a cost volume filter, which will recurrently remove the redundant information from the cost volume. Two main designs make our method not trivial. Firstly, to make the diffusion model more adaptive to stereo matching, we eschew the traditional manner of directly adding noise into the image but embed the diffusion model into a task-specific module. In this way, we outperform the traditional diffusion stereo matching method by 22% EPE improvement and 240 times inference acceleration. Secondly, DiffuVolume can be easily embedded into any volume-based stereo matching network with boost performance but slight parameters rise (only 2%). By adding the DiffuVolume into well-performed methods, we outperform all the published methods on Scene Flow, KITTI2012, KITTI2015 benchmarks and zero-shot generalization setting. It is worth mentioning that the proposed model ranks 1st on KITTI 2012 leader board, 2nd on KITTI 2015 leader board since 15, July 2023. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 17 pages, 11 figures

arXiv:2308.08885 [pdf, other]

Event-Guided Procedure Planning from Instructional Videos with Text Supervision

Authors: An-Lan Wang, Kun-Yu Lin, Jia-Run Du, **gke Meng, Wei-Shi Zheng

Abstract: In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual state. A critical challenge of this task is the large semantic gap between observed visual states and unobserved intermediate actions, which is ignored by previous works. Specifically, t… ▽ More In this work, we focus on the task of procedure planning from instructional videos with text supervision, where a model aims to predict an action sequence to transform the initial visual state into the goal visual state. A critical challenge of this task is the large semantic gap between observed visual states and unobserved intermediate actions, which is ignored by previous works. Specifically, this semantic gap refers to that the contents in the observed visual states are semantically different from the elements of some action text labels in a procedure. To bridge this semantic gap, we propose a novel event-guided paradigm, which first infers events from the observed states and then plans out actions based on both the states and predicted events. Our inspiration comes from that planning a procedure from an instructional video is to complete a specific event and a specific event usually involves specific actions. Based on the proposed paradigm, we contribute an Event-guided Prompting-based Procedure Planning (E3P) model, which encodes event information into the sequential modeling process to support procedure planning. To further consider the strong action associations within each event, our E3P adopts a mask-and-predict approach for relation mining, incorporating a probabilistic masking scheme for regularization. Extensive experiments on three datasets demonstrate the effectiveness of our proposed model. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023

arXiv:2308.00909 [pdf, other]

Rethinking Similarity Search: Embracing Smarter Mechanisms over Smarter Data

Authors: Renzhi Wu, **gfan Meng, Jie Jeff Xu, Huayi Wang, Kexin Rong

Abstract: In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a more comprehensive approach that also enhances the underpinning search mechanisms. We highlight three novel avenues that call for a redefinition of the similarit… ▽ More In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a more comprehensive approach that also enhances the underpinning search mechanisms. We highlight three novel avenues that call for a redefinition of the similarity search problem: exploiting implicit data structures and distributions, engaging users in an iterative feedback loop, and moving beyond a single query vector. These novel pathways have gained relevance in emerging applications such as large-scale language models, video clip retrieval, and data labeling. We discuss the corresponding research challenges posed by these new problem areas and share insights from our preliminary discoveries. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.05541 [pdf, other]

High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition

Authors: Tianyu Luan, Yuanhao Zhai, **g**g Meng, Zhong Li, Zhang Chen, Yi Xu, Junsong Yuan

Abstract: Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity hand modeling is required, e.g., personalized hand modeling. To address this problem, we design a frequency split network to generate 3D hand mesh using different… ▽ More Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity hand modeling is required, e.g., personalized hand modeling. To address this problem, we design a frequency split network to generate 3D hand mesh using different frequency bands in a coarse-to-fine manner. To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component. By leveraging such a coarse-to-fine scheme, hand details that correspond to the higher frequency domain can be preserved. In addition, the proposed network is scalable, and can stop the inference at any resolution level to accommodate different hardware with varying computational powers. To quantitatively evaluate the performance of our method in terms of recovering personalized shape details, we introduce a new evaluation metric named Mean Signal-to-Noise Ratio (MSNR) to measure the signal-to-noise ratio of each mesh frequency component. Extensive experiments demonstrate that our approach generates fine-grained details for high-fidelity 3D hand reconstruction, and our evaluation metric is more effective for measuring mesh details compared with traditional metrics. △ Less

Submitted 8 July, 2023; originally announced July 2023.

Comments: CVPR 2023

Journal ref: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16795-16804. 2023

arXiv:2307.00976 [pdf]

Autism Spectrum Disorder Classification in Children based on Structural MRI Features Extracted using Contrastive Variational Autoencoder

Authors: Ruimin Ma, Ruitao Xie, Yanlin Wang, **tao Meng, Yanjie Wei, Wenhui Xi, Yi Pan

Abstract: Autism spectrum disorder (ASD) is a highly disabling mental disease that brings significant impairments of social interaction ability to the patients, making early screening and intervention of ASD critical. With the development of the machine learning and neuroimaging technology, extensive research has been conducted on machine classification of ASD based on structural MRI (s-MRI). However, most… ▽ More Autism spectrum disorder (ASD) is a highly disabling mental disease that brings significant impairments of social interaction ability to the patients, making early screening and intervention of ASD critical. With the development of the machine learning and neuroimaging technology, extensive research has been conducted on machine classification of ASD based on structural MRI (s-MRI). However, most studies involve with datasets where participants' age are above 5. Few studies conduct machine classification of ASD for participants below 5-year-old, but, with mediocre predictive accuracy. In this paper, we push the boundary of predictive accuracy (above 0.97) of machine classification of ASD in children (age range: 0.92-4.83 years), based on s-MRI features extracted using contrastive variational autoencoder (CVAE). 78 s-MRI, collected from Shenzhen Children's Hospital, are used for training CVAE, which consists of both ASD-specific feature channel and common shared feature channel. The ASD participants represented by ASD-specific features can be easily discriminated from TC participants represented by the common shared features, leading to high classification accuracy. In case of degraded predictive accuracy when data size is extremely small, a transfer learning strategy is proposed here as a potential solution. Finally, we conduct neuroanatomical interpretation based on the correlation between s-MRI features extracted from CVAE and surface area of different cortical regions, which discloses potential biomarkers that could help target treatments of ASD in the future. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2307.00464 [pdf, other]

Human-to-Human Interaction Detection

Authors: Zhenhua Wang, Kaining Ying, Jiajun Meng, Jifeng Ning

Abstract: A comprehensive understanding of interested human-to-human interactions in video streams, such as queuing, handshaking, fighting and chasing, is of immense importance to the surveillance of public security in regions like campuses, squares and parks. Different from conventional human interaction recognition, which uses choreographed videos as inputs, neglects concurrent interactive groups, and per… ▽ More A comprehensive understanding of interested human-to-human interactions in video streams, such as queuing, handshaking, fighting and chasing, is of immense importance to the surveillance of public security in regions like campuses, squares and parks. Different from conventional human interaction recognition, which uses choreographed videos as inputs, neglects concurrent interactive groups, and performs detection and recognition in separate stages, we introduce a new task named human-to-human interaction detection (HID). HID devotes to detecting subjects, recognizing person-wise actions, and grou** people according to their interactive relations, in one model. First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I), by adding annotations on interactive relations in a frame-by-frame manner. AVA-I consists of 85,254 frames and 86,338 interactive groups, and each image includes up to 4 concurrent interactive groups. Second, we present a novel baseline approach SaMFormer for HID, containing a visual feature extractor, a split stage which leverages a Transformer-based model to decode action instances and interactive groups, and a merging stage which reconstructs the relationship between instances and groups. All SaMFormer components are jointly trained in an end-to-end manner. Extensive experiments on AVA-I validate the superiority of SaMFormer over representative methods. The dataset and code will be made public to encourage more follow-up studies. △ Less

Submitted 11 August, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.14097 [pdf, other]

Interpretable Small Training Set Image Segmentation Network Originated from Multi-Grid Variational Model

Authors: Junying Meng, Weihong Guo, Jun Liu, Mingrui Yang

Abstract: The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor… ▽ More The main objective of image segmentation is to divide an image into homogeneous regions for further analysis. This is a significant and crucial task in many applications such as medical imaging. Deep learning (DL) methods have been proposed and widely used for image segmentation. However, these methods usually require a large amount of manually segmented data as training data and suffer from poor interpretability (known as the black box problem). The classical Mumford-Shah (MS) model is effective for segmentation and provides a piece-wise smooth approximation of the original image. In this paper, we replace the hand-crafted regularity term in the MS model with a data adaptive generalized learnable regularity term and use a multi-grid framework to unroll the MS model and obtain a variational model-based segmentation network with better generalizability and interpretability. This approach allows for the incorporation of learnable prior information into the network structure design. Moreover, the multi-grid framework enables multi-scale feature extraction and offers a mathematical explanation for the effectiveness of the U-shaped network structure in producing good image segmentation results. Due to the proposed network originates from a variational model, it can also handle small training sizes. Our experiments on the REFUGE dataset, the White Blood Cell image dataset, and 3D thigh muscle magnetic resonance (MR) images demonstrate that even with smaller training datasets, our method yields better segmentation results compared to related state of the art segmentation methods. △ Less

Submitted 24 June, 2023; originally announced June 2023.

Comments: 25 pages, 9 figures, 6 tables

MSC Class: 94A08; 68U10

arXiv:2306.03336 [pdf, other]

doi 10.1145/3589236.3589242

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt)

Authors: Lingqi Zhang, Mohamed Wahib, Peng Chen, **tao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

Abstract: General Purpose Graphics Processing Units (GPGPU) are used in most of the top systems in HPC. The total capacity of scratchpad memory has increased by more than 40 times in the last decade. However, existing optimizations for stencil computations using temporal blocking have not aggressively exploited the large capacity of scratchpad memory. This work uses the 2D Jacobian 5-point iterative stencil… ▽ More General Purpose Graphics Processing Units (GPGPU) are used in most of the top systems in HPC. The total capacity of scratchpad memory has increased by more than 40 times in the last decade. However, existing optimizations for stencil computations using temporal blocking have not aggressively exploited the large capacity of scratchpad memory. This work uses the 2D Jacobian 5-point iterative stencil as a case study to investigate the use of large scratchpad memory. Unlike existing research that tiles the domain in a thread block fashion, we tile the domain so that each tile is large enough to utilize all available scratchpad memory on the GPU. Consequently, we process several time steps inside a single tile before offloading the result back to global memory. Our evaluation shows that our performance is comparable to state-of-the-art implementations, yet our implementation is much simpler and does not require auto-generation of code. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: This is short paper is published in the 15th workshop on general purpose processing using GPU (GPGPU 2023)

arXiv:2306.01243 [pdf, other]

Efficient Reinforcement Learning with Impaired Observability: Learning to Act with Delayed and Missing State Observations

Authors: Minshuo Chen, Jie Meng, Yu Bai, Yinyu Ye, H. Vincent Poor, Mengdi Wang

Abstract: In real-world reinforcement learning (RL) systems, various forms of {\it impaired observability} can complicate matters. These situations arise when an agent is unable to observe the most recent state of the system due to latency or lossy channels, yet the agent must still make real-time decisions. This paper introduces a theoretical investigation into efficient RL in control systems where agents… ▽ More In real-world reinforcement learning (RL) systems, various forms of {\it impaired observability} can complicate matters. These situations arise when an agent is unable to observe the most recent state of the system due to latency or lossy channels, yet the agent must still make real-time decisions. This paper introduces a theoretical investigation into efficient RL in control systems where agents must act with delayed and missing state observations. We present algorithms and establish near-optimal regret upper and lower bounds, of the form $\tilde{\mathcal{O}}(\sqrt{{\rm poly}(H) SAK})$, for RL in the delayed and missing observation settings. Here $S$ and $A$ are the sizes of state and action spaces, $H$ is the time horizon and $K$ is the number of episodes. Despite impaired observability posing significant challenges to the policy class and planning, our results demonstrate that learning remains efficient, with the regret bound optimally depending on the state-action size of the original system. Additionally, we provide a characterization of the performance of the optimal policy under impaired observability, comparing it to the optimal value obtained with full observability. Numerical results are provided to support our theory. △ Less

Submitted 26 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.18240 [pdf, other]

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

Authors: Lei Guan, Dongsheng Li, Yanqi Shi, Jian Meng

Abstract: In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then ap… ▽ More In this paper, we propose a general deep learning training framework XGrad which introduces weight prediction into the popular gradient-based optimizers to boost their convergence and generalization when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, the future weights are predicted according to the update rule of the used optimizer and are then applied to both the forward pass and backward propagation. In this way, during the whole training period, the optimizer always utilizes the gradients w.r.t. the future weights to update the DNN parameters, making the gradient-based optimizer achieve better convergence and generalization compared to the original optimizer without weight prediction. XGrad is rather straightforward to implement yet pretty effective in boosting the convergence of gradient-based optimizers and the accuracy of DNN models. Empirical results concerning five popular optimizers including SGD with momentum, Adam, AdamW, AdaBelief, and AdaM3 demonstrate the effectiveness of our proposal. The experimental results validate that XGrad can attain higher model accuracy than the baseline optimizers when training the DNN models. The code of XGrad will be available at: https://github.com/guanleics/XGrad. △ Less

Submitted 7 April, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2302.00195

arXiv:2305.07390 [pdf, other]

doi 10.1145/3577193.3593716

Revisiting Temporal Blocking Stencil Optimizations

Authors: Lingqi Zhang, Mohamed Wahib, Peng Chen, **tao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

Abstract: Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the data locality, temporal blocking is an optimization that combines a batch of time steps to process them together. Under the observation that GPUs are evolving t… ▽ More Iterative stencils are used widely across the spectrum of High Performance Computing (HPC) applications. Many efforts have been put into optimizing stencil GPU kernels, given the prevalence of GPU-accelerated supercomputers. To improve the data locality, temporal blocking is an optimization that combines a batch of time steps to process them together. Under the observation that GPUs are evolving to resemble CPUs in some aspects, we revisit temporal blocking optimizations for GPUs. We explore how temporal blocking schemes can be adapted to the new features in the recent Nvidia GPUs, including large scratchpad memory, hardware prefetching, and device-wide synchronization. We propose a novel temporal blocking method, EBISU, which champions low device occupancy to drive aggressive deep temporal blocking on large tiles that are executed tile-by-tile. We compare EBISU with state-of-the-art temporal blocking libraries: STENCILGEN and AN5D. We also compare with state-of-the-art stencil auto-tuning tools that are equipped with temporal blocking optimizations: ARTEMIS and DRSTENCIL. Over a wide range of stencil benchmarks, EBISU achieves speedups up to $2.53$x and a geometric mean speedup of $1.49$x over the best state-of-the-art performance in each stencil benchmark. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: This paper will be published in 2023 International Conference on Supercomputing (ICS23)

arXiv:2305.03795 [pdf, ps, other]

RECIPE: Rateless Erasure Codes Induced by Protocol-Based Encoding

Authors: **gfan Meng, Ziheng Liu, Yiwei Wang, Jun Xu

Abstract: LT (Luby transform) codes are a celebrated family of rateless erasure codes (RECs). Most of existing LT codes were designed for applications in which a centralized encoder possesses all message blocks and is solely responsible for encoding them into codewords. Distributed LT codes, in which message blocks are physically scattered across multiple different locations (encoders) that need to collabor… ▽ More LT (Luby transform) codes are a celebrated family of rateless erasure codes (RECs). Most of existing LT codes were designed for applications in which a centralized encoder possesses all message blocks and is solely responsible for encoding them into codewords. Distributed LT codes, in which message blocks are physically scattered across multiple different locations (encoders) that need to collaboratively perform the encoding, has never been systemically studied before despite its growing importance in applications. In this work, we present the first systemic study of LT codes in the distributed setting, and make the following three major contributions. First, we show that only a proper subset of LT codes are feasible in the distributed setting, and give the sufficient and necessary condition for such feasibility. Second, we propose a distributed encoding protocol that can efficiently implement any feasible code. The protocol is parameterized by a so-called action probability array (APA) that is only a few KBs in size, and any feasible code corresponds to a valid APA setting and vice versa. Third, we propose two heuristic search algorithms that have led to the discovery of feasible codes that are much more efficient than the state of the art. △ Less

Submitted 10 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE ISIT 2023

arXiv:2303.12730 [pdf, other]

Toward Data-Driven Glare Classification and Prediction for Marine Megafauna Survey

Authors: Joshua Power, Derek Jacoby, Marc-Antoine Drouin, Guillaume Durand, Yvonne Coady, Julian Meng

Abstract: Critically endangered species in Canadian North Atlantic waters are systematically surveyed to estimate species populations which influence governing policies. Due to its impact on policy, population accuracy is important. This paper lays the foundation towards a data-driven glare modelling system, which will allow surveyors to preemptively minimize glare. Surveyors use a detection function to est… ▽ More Critically endangered species in Canadian North Atlantic waters are systematically surveyed to estimate species populations which influence governing policies. Due to its impact on policy, population accuracy is important. This paper lays the foundation towards a data-driven glare modelling system, which will allow surveyors to preemptively minimize glare. Surveyors use a detection function to estimate megafauna populations which are not explicitly seen. A goal of the research is to maximize useful imagery collected, to that end we will use our glare model to predict glare and optimize for glare-free data collection. To build this model, we leverage a small labelled dataset to perform semi-supervised learning. The large dataset is labelled with a Cascading Random Forest Model using a naïve pseudo-labelling approach. A reflectance model is used, which pinpoints features of interest, to populate our datasets which allows for context-aware machine learning models. The pseudo-labelled dataset is used on two models: a Multilayer Perceptron and a Recurrent Neural Network. With this paper, we lay the foundation for data-driven mission planning; a glare modelling system which allows surveyors to preemptively minimize glare and reduces survey reliance on the detection function as an estimator of whale populations during periods of poor subsurface visibility. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 15 pages, 4 figures, 5th ICPR Workshop on Computer Vison for Automated Analysis of Underwater Imagery (CVAUI 2022)

arXiv:2303.00668 [pdf, other]

Roller-Quadrotor: A Novel Hybrid Terrestrial/Aerial Quadrotor with Unicycle-Driven and Rotor-Assisted Turning

Authors: Zhi Zheng, ** Wang, Yuze Wu, Qifeng Cai, Huan Yu, Ruibin Zhang, Jie Tu, Jun Meng, Guodong Lu, Fei Gao

Abstract: The Roller-Quadrotor is a novel quadrotor that combines the maneuverability of aerial drones with the endurance of ground vehicles. This work focuses on the design, modeling, and experimental validation of the Roller-Quadrotor. Flight capabilities are achieved through a quadrotor configuration, with four thrust-providing actuators. Additionally, rolling motion is facilitated by a unicycle-driven a… ▽ More The Roller-Quadrotor is a novel quadrotor that combines the maneuverability of aerial drones with the endurance of ground vehicles. This work focuses on the design, modeling, and experimental validation of the Roller-Quadrotor. Flight capabilities are achieved through a quadrotor configuration, with four thrust-providing actuators. Additionally, rolling motion is facilitated by a unicycle-driven and rotor-assisted turning structure. By utilizing terrestrial locomotion, the vehicle can overcome rolling and turning resistance, thereby conserving energy compared to its flight mode. This innovative approach not only tackles the inherent challenges of traditional rotorcraft but also enables the vehicle to navigate through narrow gaps and overcome obstacles by taking advantage of its aerial mobility. We develop comprehensive models and controllers for the Roller-Quadrotor and validate their performance through experiments. The results demonstrate its seamless transition between aerial and terrestrial locomotion, as well as its ability to safely navigate through gaps half the size of its diameter. Moreover, the terrestrial range of the vehicle is approximately 2.8 times greater, while the operating time is about 41.2 times longer compared to its aerial capabilities. These findings underscore the feasibility and effectiveness of the proposed structure and control mechanisms for efficient navigation through challenging terrains while conserving energy. △ Less

Submitted 26 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: 8 pages, 10 figures, accepted by 2023 IEEE/RSJ International Conference on Intelligent Robots(IROS). This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2302.10693 [pdf, other]

Sim2Real$^2$: Actively Building Explicit Physics Model for Precise Articulated Object Manipulation

Authors: Liqian Ma, Jiaojiao Meng, Shuntao Liu, Weihang Chen, **g Xu, Rui Chen

Abstract: Accurately manipulating articulated objects is a challenging yet important task for real robot applications. In this paper, we present a novel framework called Sim2Real$^2$ to enable the robot to manipulate an unseen articulated object to the desired state precisely in the real world with no human demonstrations. We leverage recent advances in physics simulation and learning-based perception to bu… ▽ More Accurately manipulating articulated objects is a challenging yet important task for real robot applications. In this paper, we present a novel framework called Sim2Real$^2$ to enable the robot to manipulate an unseen articulated object to the desired state precisely in the real world with no human demonstrations. We leverage recent advances in physics simulation and learning-based perception to build the interactive explicit physics model of the object and use it to plan a long-horizon manipulation trajectory to accomplish the task. However, the interactive model cannot be correctly estimated from a static observation. Therefore, we learn to predict the object affordance from a single-frame point cloud, control the robot to actively interact with the object with a one-step action, and capture another point cloud. Further, the physics model is constructed from the two point clouds. Experimental results show that our framework achieves about 70% manipulations with <30% relative error for common articulated objects, and 30% manipulations for difficult objects. Our proposed framework also enables advanced manipulation strategies, such as manipulating with different tools. Code and videos are available on our project webpage: https://ttimelord.github.io/Sim2Real2-site/ △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2302.04387 [pdf, other]

Catch Planner: Catching High-Speed Targets in the Flight

Authors: Huan Yu, Pengqin Wang, ** Wang, Jialin Ji, Zhi Zheng, Jie Tu, Guodong Lu, Jun Meng, Meixin Zhu, Shaojie Shen, Fei Gao

Abstract: Catching high-speed targets in the flight is a complex and typical highly dynamic task. In this paper, we propose Catch Planner, a planning-with-decision scheme for catching. For sequential decision making, we propose a policy search method based on deep reinforcement learning. In order to make catching adaptive and flexible, we propose a trajectory optimization method to jointly optimize the high… ▽ More Catching high-speed targets in the flight is a complex and typical highly dynamic task. In this paper, we propose Catch Planner, a planning-with-decision scheme for catching. For sequential decision making, we propose a policy search method based on deep reinforcement learning. In order to make catching adaptive and flexible, we propose a trajectory optimization method to jointly optimize the highly coupled catching time and terminal state while considering the dynamic feasibility and safety. We also propose a flexible constraint transcription method to catch targets at any reasonable attitude and terminal position bias. The proposed Catch Planner provides a new paradigm for the combination of learning and planning and is integrated on the quadrotor designed by ourselves, which runs at 100hz on the onboard computer. Extensive experiments are carried out in real and simulated scenes to verify the robustness of the proposed method and its expansibility when facing a variety of high-speed flying targets. △ Less

Submitted 26 June, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: 11 pages, 8 figures, accepted by IEEE/ASME Transactions on Mechatronics

arXiv:2212.13248 [pdf]

Characterizing and Modeling Control-Plane Traffic for Mobile Core Network

Authors: Jiayi Meng, **gqi Huang, Y. Charlie Hu, Yaron Koral, Xiaojun Lin, Muhammad Shahbaz, Abhigyan Sharma

Abstract: In this paper, we first carry out to our knowledge the first in-depth characterization of control-plane traffic, using a real-world control-plane trace for 37,325 UEs sampled at a real-world LTE Mobile Core Network (MCN). Our analysis shows that control events exhibit significant diversity in device types and time-of-day among UEs. Second, we study whether traditional probability distributions tha… ▽ More In this paper, we first carry out to our knowledge the first in-depth characterization of control-plane traffic, using a real-world control-plane trace for 37,325 UEs sampled at a real-world LTE Mobile Core Network (MCN). Our analysis shows that control events exhibit significant diversity in device types and time-of-day among UEs. Second, we study whether traditional probability distributions that have been widely adopted for modeling Internet traffic can model the control-plane traffic originated from individual UEs. Our analysis shows that the inter-arrival time of the control events as well as the sojourn time in the UE states of EMM and ECM for the cellular network cannot be modeled as Poisson processes or other traditional probability distributions. We further show that the reasons that these models fail to capture the control-plane traffic are due to its higher burstiness and longer tails in the cumulative distribution than the traditional models. Third, we propose a two-level hierarchical state-machine-based traffic model for UE clusters derived from our adaptive clustering scheme based on the Semi-Markov Model to capture key characteristics of mobile network control-plane traffic -- in particular, the dependence among events generated by each UE, and the diversity in device types and time-of-day among UEs. Finally, we show how our model can be easily adjusted from LTE to 5G to support modeling 5G control-plane traffic, when the sizable control-plane trace for 5G UEs becomes available to train the adjusted model. The developed control-plane traffic generator for LTE/5G networks is open-sourced to the research community to support high-performance MCN architecture design R&D. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2211.05985 [pdf, other]

Using Persuasive Writing Strategies to Explain and Detect Health Misinformation

Authors: Danial Kamali, Joseph Romain, Huiyi Liu, Wei Peng, **gbo Meng, Parisa Kordjamshidi

Abstract: Nowadays, the spread of misinformation is a prominent problem in society. Our research focuses on aiding the automatic identification of misinformation by analyzing the persuasive strategies employed in textual documents. We introduce a novel annotation scheme encompassing common persuasive writing tactics to achieve our objective. Additionally, we provide a dataset on health misinformation, thoro… ▽ More Nowadays, the spread of misinformation is a prominent problem in society. Our research focuses on aiding the automatic identification of misinformation by analyzing the persuasive strategies employed in textual documents. We introduce a novel annotation scheme encompassing common persuasive writing tactics to achieve our objective. Additionally, we provide a dataset on health misinformation, thoroughly annotated by experts utilizing our proposed scheme. Our contribution includes proposing a new task of annotating pieces of text with their persuasive writing strategy types. We evaluate fine-tuning and prompt-engineering techniques with pre-trained language models of the BERT family and the generative large language models of the GPT family using persuasive strategies as an additional source of information. We evaluate the effects of employing persuasive strategies as intermediate labels in the context of misinformation detection. Our results show that those strategies enhance accuracy and improve the explainability of misinformation detection models. The persuasive strategies can serve as valuable insights and explanations, enabling other models or even humans to make more informed decisions regarding the trustworthiness of the information. △ Less

Submitted 10 April, 2024; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: Accepted at LREC-CoLING-2024

arXiv:2207.13848 [pdf, other]

Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio

Authors: Zhaoyang Du, Yi** Guan, Tianchan Guan, Dimin Niu, Nianxiong Tan, Xiaopeng Yu, Hongzhong Zheng, Jianyi Meng, Xiaolang Yan, Yuan Xie

Abstract: Sparse general matrix multiplication (SpGEMM) is a fundamental building block in numerous scientific applications. One critical task of SpGEMM is to compute or predict the structure of the output matrix (i.e., the number of nonzero elements per output row) for efficient memory allocation and load balance, which impact the overall performance of SpGEMM. Existing work either precisely calculates the… ▽ More Sparse general matrix multiplication (SpGEMM) is a fundamental building block in numerous scientific applications. One critical task of SpGEMM is to compute or predict the structure of the output matrix (i.e., the number of nonzero elements per output row) for efficient memory allocation and load balance, which impact the overall performance of SpGEMM. Existing work either precisely calculates the output structure or adopts upper-bound or sampling-based methods to predict the output structure. However, these methods either take much execution time or are not accurate enough. In this paper, we propose a novel sampling-based method with better accuracy and low costs compared to the existing sampling-based method. The proposed method first predicts the compression ratio of SpGEMM by leveraging the number of intermediate products (denoted as FLOP) and the number of nonzero elements (denoted as NNZ) of the same sampled result matrix. And then, the predicted output structure is obtained by dividing the FLOP per output row by the predicted compression ratio. We also propose a reference design of the existing sampling-based method with optimized computing overheads to demonstrate the better accuracy of the proposed method. We construct 625 test cases with various matrix dimensions and sparse structures to evaluate the prediction accuracy. Experimental results show that the absolute relative errors of the proposed method and the reference design are 1.56\% and 8.12\%, respectively, on average, and 25\% and 156\%, respectively, in the worst case. △ Less

Submitted 27 July, 2022; originally announced July 2022.

Comments: This paper has been submitted to the IEEE International Conference on Parallel and Distributed Systems (ICPADS). 8 pages, 2 fgures, 3 tables

ACM Class: F.2.1; G.3; D.1.3; G.1.3

arXiv:2204.10826 [pdf, other]

doi 10.1109/JOE.2022.3194165

A Fully-autonomous Framework of Unmanned Surface Vehicles in Maritime Environments using Gaussian Process Motion Planning

Authors: Jiawei Meng, Ankita Humne, Richard Bucknall, Brendan Englot, Yuanchang Liu

Abstract: Unmanned surface vehicles (USVs) are of increasing importance to a growing number of sectors in the maritime industry, including offshore exploration, marine transportation and defence operations. A major factor in the growth in use and deployment of USVs is the increased operational flexibility that is offered through use of autonomous navigation systems that generate optimised trajectories. Unli… ▽ More Unmanned surface vehicles (USVs) are of increasing importance to a growing number of sectors in the maritime industry, including offshore exploration, marine transportation and defence operations. A major factor in the growth in use and deployment of USVs is the increased operational flexibility that is offered through use of autonomous navigation systems that generate optimised trajectories. Unlike path planning in terrestrial environments, planning in the maritime environment is more demanding as there is need to assure mitigating action is taken against the significant, random and often unpredictable environmental influences from winds and ocean currents. With the focus of these necessary requirements as the main basis of motivation, this paper proposes a novel motion planner, denoted as GPMP2*, extending the application scope of the fundamental GP-based motion planner, GPMP2, into complex maritime environments. An interpolation strategy based on Monte-Carlo stochasticity has been innovatively added to GPMP2* to produce a new algorithm named GPMP2* with Monte-Carlo stochasticity (MC-GPMP2*), which can increase the diversity of the paths generated. In parallel with algorithm design, a ROS based fully-autonomous framework for an advanced unmanned surface vehicle, the WAM-V 20 USV, has been proposed. The practicability of the proposed motion planner as well as the fully-autonomous framework have been functionally validated in a simulated inspection missions for an offshore wind farm in ROS. △ Less

Submitted 21 May, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: 17 pages, 15 figures

arXiv:2204.03775 [pdf, other]

Massively scalable stencil algorithm

Authors: Mathias Jacquelin, Mauricio Araya-Polo, Jie Meng

Abstract: Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low re-use of memory accesses. This work shows that for stencil computation a novel algorithm that leverages a localized communication strategy effectively exploits the Cerebras WSE-2, which has no cache hierarc… ▽ More Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low re-use of memory accesses. This work shows that for stencil computation a novel algorithm that leverages a localized communication strategy effectively exploits the Cerebras WSE-2, which has no cache hierarchy. This study focuses on a 25-point stencil finite-difference method for the 3D wave equation, a kernel frequently used in earth modeling as numerical simulation. In essence, the algorithm trades memory accesses for data communication and takes advantage of the fast communication fabric provided by the architecture. The algorithm -- historically memory bound -- becomes compute bound. This allows the implementation to achieve near perfect weak scaling, reaching up to 503 TFLOPs on WSE-2, a figure that only full clusters can eventually yield. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: 10 pages excl. bibliography. Submitted to SuperComputing 2022

arXiv:2204.02064 [pdf, other]

doi 10.1145/3577193.3593705

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

Authors: Lingqi Zhang, Mohamed Wahib, Peng Chen, **tao Meng, Xiao Wang, Toshio Endo, Satoshi Matsuoka

Abstract: Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (… ▽ More Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU implementations have a loop on the host side that invokes the GPU kernel as much as time/algorithm steps there are. The termination of each kernel implicitly acts the barrier required after advancing the solution every time step. We propose an execution model for running memory-bound iterative GPU kernels: PERsistent KernelS (PERKS). In this model, the time loop is moved inside persistent kernel, and device-wide barriers are used for synchronization. We then reduce the traffic to device memory by caching subset of the output in each time step in the unused registers and shared memory. PERKS can be generalized to any iterative solver: they largely independent of the solver's implementation. We explain the design principle of PERKS and demonstrate effectiveness of PERKS for a wide range of iterative 2D/3D stencil benchmarks (geomean speedup of $2.12$x for 2D stencils and $1.24$x for 3D stencils over state-of-art libraries), and a Krylov subspace conjugate gradient solver (geomean speedup of $4.86$x in smaller SpMV datasets from SuiteSparse and $1.43$x in larger SpMV datasets over a state-of-art library). All PERKS-based implementations available at: https://github.com/neozhang307/PERKS. △ Less

Submitted 12 May, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: This paper will be published in 2023 International Conference on Supercomputing (ICS23)

arXiv:2203.03635 [pdf, ps, other]

Stepwise Feature Fusion: Local Guides Global

Authors: **feng Wang, Qiming Huang, Feilong Tang, Jia Meng, Jionglong Su, Sifan Song

Abstract: Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for… ▽ More Colonoscopy, currently the most efficient and recognized colon polyp detection technology, is necessary for early screening and prevention of colorectal cancer. However, due to the varying size and complex morphological features of colonic polyps as well as the indistinct boundary between polyps and mucosa, accurate segmentation of polyps is still challenging. Deep learning has become popular for accurate polyp segmentation tasks with excellent results. However, due to the structure of polyps image and the varying shapes of polyps, it easy for existing deep learning models to overfitting the current dataset. As a result, the model may not process unseen colonoscopy data. To address this, we propose a new State-Of-The-Art model for medical image segmentation, the SSFormer, which uses a pyramid Transformer encoder to improve the generalization ability of models. Specifically, our proposed Progressive Locality Decoder can be adapted to the pyramid Transformer backbone to emphasize local features and restrict attention dispersion. The SSFormer achieves statet-of-the-art performance in both learning and generalization assessment. △ Less

Submitted 27 June, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 10 pages, 5 figures

arXiv:2203.02901 [pdf, other]

A Robust Framework of Chromosome Straightening with ViT-Patch GAN

Authors: Sifan Song, **feng Wang, Fengrui Cheng, Qirui Cao, Yihan Zuo, Yongteng Lei, Ruomai Yang, Chunxiao Yang, Frans Coenen, Jia Meng, Kang Dang, Jionglong Su

Abstract: Chromosomes carry the genetic information of humans. They exhibit non-rigid and non-articulated nature with varying degrees of curvature. Chromosome straightening is an important step for subsequent karyotype construction, pathological diagnosis and cytogenetic map development. However, robust chromosome straightening remains challenging, due to the unavailability of training images, distorted chr… ▽ More Chromosomes carry the genetic information of humans. They exhibit non-rigid and non-articulated nature with varying degrees of curvature. Chromosome straightening is an important step for subsequent karyotype construction, pathological diagnosis and cytogenetic map development. However, robust chromosome straightening remains challenging, due to the unavailability of training images, distorted chromosome details and shapes after straightening, as well as poor generalization capability. In this paper, we propose a novel architecture, ViT-Patch GAN, consisting of a self-learned motion transformation generator and a Vision Transformer-based patch (ViT-Patch) discriminator. The generator learns the motion representation of chromosomes for straightening. With the help of the ViT-Patch discriminator, the straightened chromosomes retain more shape and banding pattern details. The experimental results show that the proposed method achieves better performance on Fréchet Inception Distance (FID), Learned Perceptual Image Patch Similarity (LPIPS) and downstream chromosome classification accuracy, and shows excellent generalization capability on a large dataset. △ Less

Submitted 16 May, 2023; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: Camera-ready version for IEEE ISBI2023

arXiv:2110.07753 [pdf, other]

On Efficient Range-Summability of Ideally IID Random Variables in Two or Higher Dimensions

Authors: **gfan Meng, Huayi Wang, Jun Xu, Mitsunori Ogihara

Abstract: $d$-dimensional (for $d>1$) efficient range-summability ($d$D-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the $d… ▽ More $d$-dimensional (for $d>1$) efficient range-summability ($d$D-ERS) of random variables (RVs) is a fundamental algorithmic problem that has applications to two important families of database problems, namely, fast approximate wavelet tracking (FAWT) on data streams and approximately answering range-sum queries over a data cube. Whether there are efficient solutions to the $d$D-ERS problem, or to the latter database problem, have been two long-standing open problems. Both are solved in this work. Specifically, we propose a novel solution framework to $d$D-ERS on RVs that have Gaussian or Poisson distribution. Our $d$D-ERS solutions are the first ones that have polylogarithmic time complexities. Furthermore, we develop a novel $k$-wise independence theory that allows our $d$D-ERS solutions to have both high computational efficiencies and strong provable independence guarantees. Finally, we show that under a sufficient and likely necessary condition, certain existing solutions for 1D-ERS can be generalized to higher dimensions. △ Less

Submitted 23 January, 2023; v1 submitted 14 October, 2021; originally announced October 2021.

arXiv:2109.06366 [pdf, other]

A Dyadic Simulation Approach to Efficient Range-Summability

Authors: **gfan Meng, Huayi Wang, Jun Xu, Mitsunori Ogihara

Abstract: Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, na… ▽ More Efficient range-summability (ERS) of a long list of random variables is a fundamental algorithmic problem that has applications to three important database applications, namely, data stream processing, space-efficient histogram maintenance (SEHM), and approximate nearest neighbor searches (ANNS). In this work, we propose a novel dyadic simulation framework and develop three novel ERS solutions, namely Gaussian-dyadic simulation tree (DST), Cauchy-DST and Random Walk-DST, using it. We also propose novel rejection sampling techniques to make these solutions computationally efficient. Furthermore, we develop a novel k-wise independence theory that allows our ERS solutions to have both high computational efficiencies and strong provable independence guarantees. △ Less

Submitted 23 January, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

arXiv:2106.00090 [pdf, other]

Deep learning for prediction of hepatocellular carcinoma recurrence after resection or liver transplantation: a discovery and validation study

Authors: Zhikun Liu, Yuanpeng Liu, Yuan Hong, **wen Meng, Jianguo Wang, Shusen Zheng, Xiao Xu

Abstract: This study aimed to develop a classifier of prognosis after resection or liver transplantation (LT) for HCC by directly analysing the ubiquitously available histological images using deep learning based neural networks. Nucleus map set was used to train U-net to capture the nuclear architectural information. Train set included the patients with HCC treated by resection and has a distinct outcome.… ▽ More This study aimed to develop a classifier of prognosis after resection or liver transplantation (LT) for HCC by directly analysing the ubiquitously available histological images using deep learning based neural networks. Nucleus map set was used to train U-net to capture the nuclear architectural information. Train set included the patients with HCC treated by resection and has a distinct outcome. LT set contained patients with HCC treated by LT. Train set and its nuclear architectural information extracted by U-net were used to train MobileNet V2 based classifier (MobileNetV2_HCC_Class), purpose-built for classifying supersized heterogeneous images. The MobileNetV2_HCC_Class maintained relative higher discriminatory power than the other factors after HCC resection or LT in the independent validation set. Pathological review showed that the tumoral areas most predictive of recurrence were characterized by presence of stroma, high degree of cytological atypia, nuclear hyperchomasia, and a lack of immune infiltration. A clinically useful prognostic classifier was developed using deep learning allied to histological slides. The classifier has been extensively evaluated in independent patient populations with different treatment, and gives consistent excellent results across the classical clinical, biological and pathological features. The classifier assists in refining the prognostic prediction of HCC patients and identifying patients who would benefit from more intensive management. △ Less

Submitted 31 May, 2021; originally announced June 2021.

arXiv:2103.05864 [pdf, ps, other]

MP-RW-LSH: An Efficient Multi-Probe LSH Solution to ANNS in $L_1$ Distance

Authors: Huayi Wang, **gfan Meng, Long Gong, Jun Xu, Mitsunori Ogihara

Abstract: Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-sensitive hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a hig… ▽ More Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-sensitive hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS-$L_2$, a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in $L_1$ distance. Another contribution of this work is to explain why a state-of-the-art ANNS-$L_1$ solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. We show that MP-RW-LSH uses 15 to 53 times fewer hash tables than CP-LSH for achieving similar query accuracies. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2103.02835 [pdf, other]

A Novel Application of Image-to-Image Translation: Chromosome Straightening Framework by Learning from a Single Image

Authors: Sifan Song, Daiyun Huang, Yalun Hu, Chunxiao Yang, Jia Meng, Fei Ma, Frans Coenen, Jiaming Zhang, Jionglong Su

Abstract: In medical imaging, chromosome straightening plays a significant role in the pathological study of chromosomes and in the development of cytogenetic maps. Whereas different approaches exist for the straightening task, typically geometric algorithms are used whose outputs are characterized by jagged edges or fragments with discontinued banding patterns. To address the flaws in the geometric algorit… ▽ More In medical imaging, chromosome straightening plays a significant role in the pathological study of chromosomes and in the development of cytogenetic maps. Whereas different approaches exist for the straightening task, typically geometric algorithms are used whose outputs are characterized by jagged edges or fragments with discontinued banding patterns. To address the flaws in the geometric algorithms, we propose a novel framework based on image-to-image translation to learn a pertinent map** dependence for synthesizing straightened chromosomes with uninterrupted banding patterns and preserved details. In addition, to avoid the pitfall of deficient input chromosomes, we construct an augmented dataset using only one single curved chromosome image for training models. Based on this framework, we apply two popular image-to-image translation architectures, U-shape networks and conditional generative adversarial networks, to assess its efficacy. Experiments on a dataset comprised of 642 real-world chromosomes demonstrate the superiority of our framework, as compared to the geometric method in straightening performance, by rendering realistic and continued chromosome details. Furthermore, our straightened results improve the chromosome classification by 0.98%-1.39% mean accuracy. △ Less

Submitted 19 October, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: This work has been accepted by CISP-BMEI2021

arXiv:2102.00713 [pdf, other]

Aurora Guard: Reliable Face Anti-Spoofing via Mobile Lighting System

Authors: Jian Zhang, Ying Tai, Tai** Yao, Jia Meng, Shouhong Ding, Chengjie Wang, Jilin Li, Feiyue Huang, Rongrong Ji

Abstract: Face authentication on mobile end has been widely applied in various scenarios. Despite the increasing reliability of cutting-edge face authentication/verification systems to variations like blinking eye and subtle facial expression, anti-spoofing against high-resolution rendering replay of paper photos or digital videos retains as an open problem. In this paper, we propose a simple yet effective… ▽ More Face authentication on mobile end has been widely applied in various scenarios. Despite the increasing reliability of cutting-edge face authentication/verification systems to variations like blinking eye and subtle facial expression, anti-spoofing against high-resolution rendering replay of paper photos or digital videos retains as an open problem. In this paper, we propose a simple yet effective face anti-spoofing system, termed Aurora Guard (AG). Our system firstly extracts the normal cues via light reflection analysis, and then adopts an end-to-end trainable multi-task Convolutional Neural Network (CNN) to accurately recover subjects' intrinsic depth and material map to assist liveness classification, along with the light CAPTCHA checking mechanism in the regression branch to further improve the system reliability. Experiments on public Replay-Attack and CASIA datasets demonstrate the merits of our proposed method over the state-of-the-arts. We also conduct extensive experiments on a large-scale dataset containing 12,000 live and diverse spoofing samples, which further validates the generalization ability of our method in the wild. △ Less

Submitted 1 February, 2021; originally announced February 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:1902.10311

Showing 1–50 of 80 results for author: Meng, J