Skip to main content

Showing 1–50 of 112 results for author: Zhuang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01599  [pdf, other

    cs.CL cs.CR cs.CV cs.LG

    JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models

    Authors: Haibo **, Leyang Hu, Xinuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, Haohan Wang

    Abstract: The rapid evolution of artificial intelligence (AI) through developments in Large Language Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements across various technological domains. While these models enhance capabilities in natural language processing and visual interactive tasks, their growing adoption raises critical concerns regarding security and ethical alignm… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 44 pages

  2. arXiv:2406.19844  [pdf, other

    cs.CV cs.RO

    StreamMOTP: Streaming and Unified Framework for Joint 3D Multi-Object Tracking and Trajectory Prediction

    Authors: Jiaheng Zhuang, Guoan Wang, Siyu Zhang, Xiyang Wang, Hangning Zhou, Ziyao Xu, Chi Zhang, Zhiheng Li

    Abstract: 3D multi-object tracking and trajectory prediction are two crucial modules in autonomous driving systems. Generally, the two tasks are handled separately in traditional paradigms and a few methods have started to explore modeling these two tasks in a joint manner recently. However, these approaches suffer from the limitations of single-frame training and inconsistent coordinate representations bet… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  3. arXiv:2406.06959  [pdf, other

    cs.LG cs.AI

    Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

    Authors: Jiawei Zhang, Jiaxin Zhuang, Cheng **, Gen Li, Yuantao Gu

    Abstract: The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primari… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.05392  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

    Authors: Chengyuan Deng, Yiqun Duan, Xin **, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao **, Yijia Xiao, Yichen Wang, Shenghao Wu, Zongxing Xie, Kuofeng Gao, Sihong He, Jun Zhuang, Lu Cheng, Haohan Wang

    Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  5. arXiv:2406.03368  [pdf, other

    cs.CL cs.AI

    IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

    Authors: David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun Zhuang, Jesujoba O. Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa Bukula, En-Shiun Annie Lee, Chiamaka Chukwuneke, Happy Buzaaba, Blessing Sibanda, Godson Kalipe, Jonathan Mukiibi, Salomon Kabongo, Foutse Yuehgoh, Mmasibidi Setaka, Lolwethu Ndolela, Nkiruka Odu, Rooweither Mabuya, Shamsuddeen Hassan Muhammad, Salomey Osei, Sokhar Samb, Tadesse Kebede Guge , et al. (1 additional authors not shown)

    Abstract: Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoB… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Under review

  6. arXiv:2406.03097  [pdf, other

    cs.LG cs.AI

    Enhancing the Resilience of Graph Neural Networks to Topological Perturbations in Sparse Graphs

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Luyao Peng, Jun Song

    Abstract: Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian labe… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  7. arXiv:2406.01264  [pdf, other

    cs.CV

    FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis

    Authors: Linshan Wu, Jiaxin Zhuang, Xuefeng Ni, Hao Chen

    Abstract: AI-driven tumor analysis has garnered increasing attention in healthcare. However, its progress is significantly hindered by the lack of annotated tumor cases, which requires radiologists to invest a lot of effort in collecting and annotation. In this paper, we introduce a highly practical solution for robust tumor synthesis and segmentation, termed FreeTumor, which refers to annotation-free synth… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Preprint

  8. arXiv:2405.19590  [pdf, other

    cs.LG

    Weights Augmentation: it has never ever ever ever let her model down

    Authors: Junbin Zhuang, Guiguang Din, Yunyi Yan

    Abstract: Weight play an essential role in deep learning network models. Unlike network structure design, this article proposes the concept of weight augmentation, focusing on weight exploration. The core of Weight Augmentation Strategy (WAS) is to adopt random transformed weight coefficients training and transformed coefficients, named Shadow Weight(SW), for networks that can be used to calculate loss func… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  9. arXiv:2405.01606  [pdf, other

    quant-ph cs.LG

    Improving Trainability of Variational Quantum Circuits via Regularization Strategies

    Authors: Jun Zhuang, Jack Cunningham, Chaowen Guan

    Abstract: In the era of noisy intermediate-scale quantum (NISQ), variational quantum circuits (VQCs) have been widely applied in various domains, advancing the superiority of quantum circuits against classic models. Similar to classic models, regular VQCs can be optimized by various gradient-based methods. However, the optimization may be initially trapped in barren plateaus or eventually entangled in saddl… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: preprint, under review. TL;DR: we propose a regularization strategy to improve the trainability of VQCs

  10. arXiv:2404.15760  [pdf, other

    cs.LG cs.AI stat.ML

    Debiasing Machine Unlearning with Counterfactual Examples

    Authors: Ziheng Chen, Jia Wang, Jun Zhuang, Abbavaram Gowtham Reddy, Fabrizio Silvestri, ** Huang, Kaushiki Nag, Kun Kuang, Xin Ning, Gabriele Tolomei

    Abstract: The right to be forgotten (RTBF) seeks to safeguard individuals from the enduring effects of their historical actions by implementing machine-learning techniques. These techniques facilitate the deletion of previously acquired knowledge without requiring extensive model retraining. However, they often overlook a critical issue: unlearning processes bias. This bias emerges from two main sources: (1… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  11. arXiv:2404.15580  [pdf, other

    cs.CV

    MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

    Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen

    Abstract: The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the per… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: submitted to journal

  12. arXiv:2404.06039  [pdf, other

    cs.HC

    Breathing New Life into Existing Visualizations: A Natural Language-Driven Manipulation Framework

    Authors: Can Liu, Jiacheng Yu, Yuhan Guo, Jiayi Zhuang, Yuchu Luo, Xiaoru Yuan

    Abstract: We propose an approach to manipulate existing interactive visualizations to answer users' natural language queries. We analyze the natural language tasks and propose a design space of a hierarchical task structure, which allows for a systematic decomposition of complex queries. We introduce a four-level visualization manipulation space to facilitate in-situ manipulations for visualizations, enabli… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 21 pages

  13. Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation

    Authors: Hui Xiao, Yuting Hong, Li Dong, Diqun Yan, Jiayan Zhuang, Junjie Xiong, Dongtai Liang, Chengbin Peng

    Abstract: Semi-supervised semantic segmentation relieves the reliance on large-scale labeled data by leveraging unlabeled data. Recent semi-supervised semantic segmentation approaches mainly resort to pseudo-labeling methods to exploit unlabeled data. However, unreliable pseudo-labeling can undermine the semi-supervision processes. In this paper, we propose an algorithm called Multi-Level Label Correction (… ▽ More

    Submitted 9 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures. IEEE Transactions on Multimedia, 2024

  14. arXiv:2403.01976  [pdf, other

    cs.CL

    SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

    Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, ** Huang, Fang Xi, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, Changhong Chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, sparking significant interest in applying them to scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  15. arXiv:2402.10409  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Understanding Survey Paper Taxonomy about Large Language Models via Graph Representation Learning

    Authors: Jun Zhuang, Casey Kennington

    Abstract: As new research on Large Language Models (LLMs) continues, it is difficult to keep up with new research and models. To help researchers synthesize the new research many have written survey papers, but even those have become numerous. In this paper, we develop a method to automatically assign survey papers to a taxonomy. We collect the metadata of 144 LLM survey papers and explore three paradigms t… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: TL;DR: We collected metadata about LLM surveys and developed a method for categorizing them into a taxonomy, indicating the superiority of graph representation learning over language models and revealing the efficacy of fine-tuning using weak labels

  16. arXiv:2401.14828  [pdf, other

    cs.CV

    TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts

    Authors: **gyu Zhuang, Di Kang, Yan-Pei Cao, Guanbin Li, Liang Lin, Ying Shan

    Abstract: Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D b… ▽ More

    Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Accpeted by Siggraph 2024 & ACM Transactions on Graphics

  17. arXiv:2401.11261  [pdf, other

    cs.LG cs.CV

    Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

    Authors: Weiguo Lu, Xuan Wu, Deng Ding, **qiao Duan, Jirong Zhuang, Gangnan Yuan

    Abstract: Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feat… ▽ More

    Submitted 1 February, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

  18. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

    Authors: **ming Zhuang, Zhuo** Yang, Shixin Ji, Heng Huang, Alex K. Jones, **gtong Hu, Yiyu Shi, Peipei Zhou

    Abstract: With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latenc… ▽ More

    Submitted 18 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Journal ref: 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '24)

  19. arXiv:2401.00695  [pdf, other

    cs.CV

    Credible Teacher for Semi-Supervised Object Detection in Open Scene

    Authors: **gyu Zhuang, Kuo Wang, Liang Lin, Guanbin Li

    Abstract: Semi-Supervised Object Detection (SSOD) has achieved resounding success by leveraging unlabeled data to improve detection performance. However, in Open Scene Semi-Supervised Object Detection (O-SSOD), unlabeled data may contains unknown objects not observed in the labeled data, which will increase uncertainty in the model's predictions for known objects. It is detrimental to the current methods th… ▽ More

    Submitted 2 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: Accpet by ICASSP 2024

  20. arXiv:2312.10903  [pdf, other

    cs.LG cs.AI

    Robust Node Representation Learning via Graph Variational Diffusion Networks

    Authors: Jun Zhuang, Mohammad Al Hasan

    Abstract: Node representation learning by using Graph Neural Networks (GNNs) has been widely explored. However, in recent years, compelling evidence has revealed that GNN-based node representation learning can be substantially deteriorated by delicately-crafted perturbations in a graph structure. To learn robust node representation in the presence of perturbations, various works have been proposed to safegu… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: preprint, under review

  21. arXiv:2312.03594  [pdf, other

    cs.CV

    A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

    Authors: Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen

    Abstract: Achieving high-quality versatile image inpainting, where user-specified regions are filled with plausible content according to user intent, presents a significant challenge. Existing methods face difficulties in simultaneously addressing context-aware image inpainting and text-guided object inpainting due to the distinct optimal training strategies required. To overcome this challenge, we introduc… ▽ More

    Submitted 11 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project page with code: https://powerpaint.github.io/

  22. arXiv:2312.02991  [pdf, other

    cs.AR

    REFRESH FPGAs: Sustainable FPGA Chiplet Architectures

    Authors: Peipei Zhou, **ming Zhuang, Stephen Cahoon, Yue Tang, Zhuo** Yang, Xingzhen Chen, Yiyu Shi, **gtong Hu, Alex K. Jones

    Abstract: There is a growing call for greater amounts of increasingly agile computational power for edge and cloud infrastructure to serve the computationally complex needs of ubiquitous computing devices. Thus, an important challenge is addressing the holistic environmental impacts of these next-generation computing systems. To accomplish this, a life-cycle view of sustainability for computing advancements… ▽ More

    Submitted 27 November, 2023; originally announced December 2023.

  23. arXiv:2311.18420  [pdf, other

    cs.CV

    TeG-DG: Textually Guided Domain Generalization for Face Anti-Spoofing

    Authors: Lianrui Mu, Jianhong Bai, Xiaoxuan He, Jiangnan Ye, Xiaoyu Liang, Yuchen Yang, Jiedong Zhuang, Haoji Hu

    Abstract: Enhancing the domain generalization performance of Face Anti-Spoofing (FAS) techniques has emerged as a research focus. Existing methods are dedicated to extracting domain-invariant features from various training domains. Despite the promising performance, the extracted features inevitably contain residual style feature bias (e.g., illumination, capture device), resulting in inferior generalizatio… ▽ More

    Submitted 30 January, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  24. arXiv:2311.16417  [pdf, other

    cs.AR

    Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets

    Authors: Zhuo** Yang, Shixin Ji, Xingzhen Chen, **ming Zhuang, Weifeng Zhang, Dharmesh Jani, Peipei Zhou

    Abstract: Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scali… ▽ More

    Submitted 4 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  25. arXiv:2310.01159  [pdf, other

    eess.IV cs.CV cs.LG

    Iterative Semi-Supervised Learning for Abdominal Organs and Tumor Segmentation

    Authors: Jiaxin Zhuang, Luyang Luo, Zhixuan Chen, Linshan Wu

    Abstract: Deep-learning (DL) based methods are playing an important role in the task of abdominal organs and tumors segmentation in CT scans. However, the large requirements of annotated datasets heavily limit its development. The FLARE23 challenge provides a large-scale dataset with both partially and fully annotated data, which also focuses on both segmentation accuracy and computational efficiency. In th… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: text overlap with arXiv:2309.05405

  26. arXiv:2309.16137  [pdf, other

    cs.CV

    Context-I2W: Map** Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

    Authors: Yuanmin Tang, **g Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Yue Hu, Qi Wu

    Abstract: Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive… ▽ More

    Submitted 15 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

    Journal ref: AAAI 2024

  27. arXiv:2309.12275  [pdf, other

    cs.AR

    AIM: Accelerating Arbitrary-precision Integer Multiplication on Heterogeneous Reconfigurable Computing Platform Versal ACAP

    Authors: Zhuo** Yang, **ming Zhuang, Jiaqi Yin, Cunxi Yu, Alex K. Jones, Peipei Zhou

    Abstract: Arbitrary-precision integer multiplication is the core kernel of many applications in simulation, cryptography, etc. Existing acceleration of arbitrary-precision integer multiplication includes CPUs, GPUs, FPGAs, and ASICs. Among these accelerators, FPGAs are promised to provide both good energy efficiency and flexibility. Surprisingly, in our implementations, FPGA has the lowest energy efficiency… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  28. Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment

    Authors: Jiamin Zhuang, **g Yu, Yang Ding, Xiangyan Qu, Yue Hu

    Abstract: Image-text retrieval requires the system to bridge the heterogenous gap between vision and language for accurate retrieval while kee** the network lightweight-enough for efficient retrieval. Existing trade-off solutions mainly study from the view of incorporating cross-modal interactions with the independent-embedding framework or leveraging stronger pretrained encoders, which still demand time-… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted in IEEE Transactions on Multimedia (TMM)

    Journal ref: IEEE Transactions on Multimedia ( Early Access ), 29 May 2023

  29. arXiv:2307.10036  [pdf, other

    cs.CV

    Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition

    Authors: Jia-Xin Zhuang, Jiabin Cai, Jianguo Zhang, Wei-shi Zheng, Ruixuan Wang

    Abstract: Automated medical image classification is the key component in intelligent diagnosis systems. However, most medical image datasets contain plenty of samples of common diseases and just a handful of rare ones, leading to major class imbalances. Currently, it is an open problem in intelligent diagnosis to effectively learn from imbalanced training data. In this paper, we propose a simple yet effecti… ▽ More

    Submitted 20 July, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

    Comments: Accepted by Neurocomputing on July 2023. 37 pages

  30. arXiv:2306.13455  [pdf, other

    cs.CV

    DreamEditor: Text-Driven 3D Scene Editing with Neural Fields

    Authors: **gyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin Li

    Abstract: Neural fields have achieved impressive advancements in view synthesis and scene reconstruction. However, editing these neural fields remains challenging due to the implicit encoding of geometry and texture information. In this paper, we propose DreamEditor, a novel framework that enables users to perform controlled editing of neural fields using text prompts. By representing scenes as mesh-based n… ▽ More

    Submitted 7 September, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted by SIGGRAPH Asia 2023

  31. arXiv:2306.08939  [pdf, other

    cs.CV

    Revisiting Stereo Triangulation in UAV Distance Estimation

    Authors: Jiafan Zhuang, Duan Yuan, Rihong Yan, Weixin Huang, Wenji Li, Zhun Fan

    Abstract: Distance estimation plays an important role for path planning and collision avoidance of swarm UAVs. However, the lack of annotated data seriously hinders the related studies. In this work, we build and present a UAVDE dataset for UAV distance estimation, in which distance between two UAVs is obtained by UWB sensors. During experiments, we surprisingly observe that the stereo triangulation cannot… ▽ More

    Submitted 2 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  32. arXiv:2306.08913  [pdf, other

    cs.CV cs.AI

    Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoder

    Authors: Jia-Xin Zhuang, Luyang Luo, Hao Chen

    Abstract: Masked autoencoder (MAE) is a promising self-supervised pre-training technique that can improve the representation learning of a neural network without human intervention. However, applying MAE directly to volumetric medical images poses two challenges: (i) a lack of global information that is crucial for understanding the clinical context of the holistic data, (ii) no guarantee of stabilizing the… ▽ More

    Submitted 23 August, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  33. arXiv:2305.18698  [pdf, other

    cs.AR

    AutoMM: Energy-Efficient Multi-Data-Type Matrix Multiply Design on Heterogeneous Programmable System-on-Chip

    Authors: **ming Zhuang, Zhuo** Yang, Peipei Zhou

    Abstract: As the increasing complexity of Neural Network(NN) models leads to high demands for computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), i.e., Versal ACAP architectures featured with programmable logic (PL), CPUs, and dedicated AI engines (AIE) ASICs which has a theoretical throughput up to 6.4 TFLOPs for FP32, 25.6 TOPs for INT16 and 102.4 TOPs for INT8. However, the hig… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  34. arXiv:2304.09322  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-Modality Multi-Scale Cardiovascular Disease Subtypes Classification Using Raman Image and Medical History

    Authors: Bo Yu, Hechang Chen, Chengyou Jia, Hongren Zhou, Lele Cong, Xiankai Li, Jianhui Zhuang, Xianling Cong

    Abstract: Raman spectroscopy (RS) has been widely used for disease diagnosis, e.g., cardiovascular disease (CVD), owing to its efficiency and component-specific testing capabilities. A series of popular deep learning methods have recently been introduced to learn nuance features from RS for binary classifications and achieved outstanding performance than conventional machine learning methods. However, these… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

    Journal ref: [J]. Expert Systems with Applications, 2023: 119965

  35. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  36. arXiv:2303.01047  [pdf, other

    cs.CV

    Task-Specific Context Decoupling for Object Detection

    Authors: Jiayuan Zhuang, Zheng Qin, Hao Yu, Xucan Chen

    Abstract: Classification and localization are two main sub-tasks in object detection. Nonetheless, these two tasks have inconsistent preferences for feature context, i.e., localization expects more boundary-aware features to accurately regress the bounding box, while more semantic context is preferred for object classification. Exsiting methods usually leverage disentangled heads to learn different feature… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  37. arXiv:2301.03832  [pdf, other

    cs.CV

    Video Semantic Segmentation with Inter-Frame Feature Fusion and Inner-Frame Feature Refinement

    Authors: Jiafan Zhuang, Zilei Wang, Junjie Li

    Abstract: Video semantic segmentation aims to generate accurate semantic maps for each video frame. To this end, many works dedicate to integrate diverse information from consecutive frames to enhance the features for prediction, where a feature alignment procedure via estimated optical flow is usually required. However, the optical flow would inevitably suffer from inaccuracy, and then introduce noises in… ▽ More

    Submitted 10 January, 2023; originally announced January 2023.

  38. arXiv:2301.02359  [pdf, other

    cs.AR

    CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP Architecture

    Authors: **ming Zhuang, Jason Lau, Hanchen Ye, Zhuo** Yang, Yubo Du, Jack Lo, Kristof Denolf, Stephen Neuendorffer, Alex Jones, **gtong Hu, Deming Chen, Jason Cong, Peipei Zhou

    Abstract: Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged as promising platforms. For example, the AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores and programmable logic wi… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

  39. arXiv:2211.01607  [pdf, other

    eess.IV cs.LG

    ImageCAS: A Large-Scale Dataset and Benchmark for Coronary Artery Segmentation based on Computed Tomography Angiography Images

    Authors: An Zeng, Chunbiao Wu, Mei** Huang, Jian Zhuang, Shanshan Bi, Dan Pan, Najeeb Ullah, Kaleem Nawaz Khan, Tianchen Wang, Yiyu Shi, Xiaomeng Li, Guisen Lin, Xiaowei Xu

    Abstract: Cardiovascular disease (CVD) accounts for about half of non-communicable diseases. Vessel stenosis in the coronary artery is considered to be the major risk of CVD. Computed tomography angiography (CTA) is one of the widely used noninvasive imaging modalities in coronary artery diagnosis due to its superior image resolution. Clinically, segmentation of coronary arteries is essential for the diagno… ▽ More

    Submitted 17 October, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: 17 pages, 12 figures, 4 tables

    Journal ref: Computerized Medical Imaging and Graphics, 2023

  40. arXiv:2209.06656  [pdf, ps, other

    cs.IT cs.CR

    Syndrome decoding meets multiple instances

    Authors: Haoxuan Wu, **cheng Zhuang

    Abstract: The NP-hard problem of decoding random linear codes is crucial to both coding theory and cryptography. In particular, this problem underpins the security of many code based post-quantum cryptographic schemes. The state-of-art algorithms for solving this problem are the information syndrome decoding algorithm and its advanced variants. In this work, we consider syndrome decoding in the multiple ins… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

  41. arXiv:2209.00726  [pdf, other

    eess.IV cs.CV

    Learning correspondences of cardiac motion from images using biomechanics-informed modeling

    Authors: Xiaoran Zhang, Chenyu You, Shawn Ahn, Juntang Zhuang, Lawrence Staib, James Duncan

    Abstract: Learning spatial-temporal correspondences in cardiac motion from images is important for understanding the underlying dynamics of cardiac anatomical structures. Many methods explicitly impose smoothness constraints such as the $\mathcal{L}_2$ norm on the displacement vector field (DVF), while usually ignoring biomechanical feasibility in the transformation. Other geometric constraints either regul… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Accepted by MICCAI-STACOM 2022 as an oral presentation

  42. Robust Node Classification on Graphs: Jointly from Bayesian Label Transition and Topology-based Label Propagation

    Authors: Jun Zhuang, Mohammad Al Hasan

    Abstract: Node classification using Graph Neural Networks (GNNs) has been widely applied in various real-world scenarios. However, in recent years, compelling evidence emerges that the performance of GNN-based node classification may deteriorate substantially by topological perturbation, such as random connections or adversarial attacks. Various solutions, such as topological denoising methods and mechanism… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

    Comments: The paper is accepted for CIKM 2022

  43. arXiv:2208.05616  [pdf, other

    eess.IV cs.CV cs.LG

    OpenMedIA: Open-Source Medical Image Analysis Toolbox and Benchmark under Heterogeneous AI Computing Platforms

    Authors: Jia-Xin Zhuang, Xiansong Huang, Yang Yang, Jiancong Chen, Yue Yu, Wei Gao, Ge Li, Jie Chen, Tong Zhang

    Abstract: In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindS… ▽ More

    Submitted 7 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 12 pages, 1 figure

  44. arXiv:2206.12558  [pdf, other

    cs.CV

    FastBVP-Net: a lightweight pulse extraction network for measuring heart rhythm via facial videos

    Authors: Jialiang Zhuang, Yuheng Chen, Yun Zhang, Xiujuan Zheng

    Abstract: Remote photoplethysmography (rPPG) is an attractive camera-based health monitoring method that can measure the heart rhythm from facial videos. Many well-established deep-learning models have been reported to measure heart rate (HR) and heart rate variability (HRV). However, most of these models usually require a 30-second facial video and enormous computational resources to obtain accurate and ro… ▽ More

    Submitted 21 December, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: 9 pages, 2figures

  45. arXiv:2206.02295  [pdf, other

    cs.CV cs.AI

    HIFI-Net: A Novel Network for Enhancement to Underwater Images

    Authors: Jiajia Zhou, Junbin Zhuang, Yan Zheng, Di Wu

    Abstract: A novel network for enhancement to underwater images is proposed in this paper. It contains a Reinforcement Fusion Module for Haar wavelet images (RFM-Haar) based on Reinforcement Fusion Unit (RFU), which is used to fuse an original image and some important information within it. Fusion is achieved for better enhancement. As this network make "Haar Images into Fusion Images", it is called HIFI-Net… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: 7 pages, 4 figures

  46. arXiv:2203.08065  [pdf, other

    cs.LG cs.AI

    Surrogate Gap Minimization Improves Sharpness-Aware Training

    Authors: Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu

    Abstract: The recently proposed Sharpness-Aware Minimization (SAM) improves generalization by minimizing a \textit{perturbed loss} defined as the maximum loss within a neighborhood in the parameter space. However, we show that both sharp and flat minima can have a low perturbed loss, implying that SAM does not always prefer flat minima. Instead, we define a \textit{surrogate gap}, a measure equivalent to th… ▽ More

    Submitted 19 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Paper accepted by ICLR22, https://openreview.net/forum?id=edONMAnhLu-

  47. arXiv:2203.03762  [pdf, other

    cs.LG

    Defending Graph Convolutional Networks against Dynamic Graph Perturbations via Bayesian Self-supervision

    Authors: Jun Zhuang, Mohammad Al Hasan

    Abstract: In recent years, plentiful evidence illustrates that Graph Convolutional Networks (GCNs) achieve extraordinary accomplishments on the node classification task. However, GCNs may be vulnerable to adversarial attacks on label-scarce dynamic graphs. Many existing works aim to strengthen the robustness of GCNs; for instance, adversarial training is used to shield GCNs against malicious perturbations.… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: The paper is accepted by AAAI 2022

  48. arXiv:2203.03634  [pdf, other

    eess.IV cs.CV cs.HC

    Remote blood pressure measurement via spatiotemporal map** of a short-time facial video

    Authors: Jialiang Zhuang, Bin Li, Yun Zhang, Yuheng Chen, Xiujuan Zheng

    Abstract: Blood pressure (BP) monitoring is vital in daily healthcare, especially for cardiovascular diseases. However, BP values are mainly acquired through the contact sensing method, which is inconvenient and unfriendly to continuous BP measurement. Hence, we propose an efficient end-to-end network to estimate the BP values from a facial video to achieve remote BP measurement in daily life. In this study… ▽ More

    Submitted 23 June, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 7 pages, 7 figures

  49. arXiv:2203.03329  [pdf, other

    cs.CV

    Open Set Domain Adaptation By Novel Class Discovery

    Authors: **gyu Zhuang, Ziliang Chen, Pengxu Wei, Guanbin Li, Liang Lin

    Abstract: In Open Set Domain Adaptation (OSDA), large amounts of target samples are drawn from the implicit categories that never appear in the source domain. Due to the lack of their specific belonging, existing methods indiscriminately regard them as a single class unknown. We challenge this broadly-adopted practice that may arouse unexpected detrimental effects because the decision boundaries between the… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  50. arXiv:2202.08199  [pdf, other

    cs.CV

    Less is More: Surgical Phase Recognition from Timestamp Supervision

    Authors: Xinpeng Ding, Xinjian Yan, Zixun Wang, Wei Zhao, Jian Zhuang, Xiaowei Xu, Xiaomeng Li

    Abstract: Surgical phase recognition is a fundamental task in computer-assisted surgery systems. Most existing works are under the supervision of expensive and time-consuming full annotations, which require the surgeons to repeat watching videos to find the precise start and end time for a surgical phase. In this paper, we introduce timestamp supervision for surgical phase recognition to train the models wi… ▽ More

    Submitted 30 November, 2022; v1 submitted 16 February, 2022; originally announced February 2022.