Skip to main content

Showing 1–50 of 654 results for author: Zhou, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02283  [pdf, other

    cs.CV cs.AI

    A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling

    Authors: Minghao Zhou, Hong Wang, Yefeng Zheng, Deyu Meng

    Abstract: Feature upsampling is a fundamental and indispensable ingredient of almost all current network structures for image segmentation tasks. Recently, a popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance to help upsample the low-resolution deep feature based on their local similarity. Albeit achieving promising performance, this… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/zmhhmz/ReSFU

  2. arXiv:2407.02211  [pdf, other

    cs.CL cs.AI cs.LG

    PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning

    Authors: Jiaru Zou, Mengyu Zhou, Tao Li, Shi Han, Dongmei Zhang

    Abstract: Large language models (LLMs) have played a fundamental role in various natural language processing tasks with powerful prompt techniques. However, in real-world applications, there are often similar prompt components for repeated queries, which causes significant computational burdens during inference. Existing prompt compression and direct fine-tuning methods aim to tackle these challenges, yet t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.02118  [pdf, other

    cs.CL

    Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

    Authors: Wenzhen Zheng, Wenbo Pan, Xu Xu, Libo Qin, Li Yue, Ming Zhou

    Abstract: In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages

  4. arXiv:2407.00604  [pdf, other

    cs.AR

    Fast-OverlaPIM: A Fast Overlap-driven Map** Framework for Processing In-Memory Neural Network Acceleration

    Authors: Xuan Wang, Minxuan Zhou, Tajana Rosing

    Abstract: Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application map**, which determines the operation scheduling and data layout, plays a critical role in the NN acceleration on PIM. The map** optimization of previous NN accelerators focused on opti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: This work is accepted by IEEE TCAD

  5. arXiv:2406.17591  [pdf, other

    cs.CV

    DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation

    Authors: Ahmad Mohammadshirazi, Ali Nosrati Firoozsalari, Mengxi Zhou, Dheeraj Kulshrestha, Rajiv Ramnath

    Abstract: Automating the annotation of scanned documents is challenging, requiring a balance between computational efficiency and accuracy. DocParseNet addresses this by combining deep learning and multi-modal learning to process both text and visual data. This model goes beyond traditional OCR and semantic segmentation, capturing the interplay between text and images to preserve contextual nuances in compl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  7. arXiv:2406.10797  [pdf, other

    cs.CV

    STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

    Authors: Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Huaian Chen, Yi **

    Abstract: We present STAR, a text-to-image model that employs scale-wise auto-regressive paradigm. Unlike VAR, which is limited to class-conditioned synthesis within a fixed set of predetermined categories, our STAR enables text-driven open-set generation through three key designs: To boost diversity and generalizability with unseen combinations of objects and concepts, we introduce a pre-trained text encod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  8. arXiv:2406.09357  [pdf, other

    cs.LG stat.ML

    Advancing Graph Generation through Beta Diffusion

    Authors: Yilin He, Xinyang Liu, Bo Chen, Mingyuan Zhou

    Abstract: Diffusion models have demonstrated effectiveness in generating natural images and have been extended to generate diverse data types, including graphs. This new generation of diffusion-based graph generative models has demonstrated significant performance improvements over methods that rely on variational autoencoders or generative adversarial networks. It's important to recognize, however, that mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  9. arXiv:2406.08762  [pdf, other

    cs.SI cs.CY

    LGB: Language Model and Graph Neural Network-Driven Social Bot Detection

    Authors: Ming Zhou, Dan Zhang, Yuandong Wang, Yangli-ao Geng, Yuxiao Dong, Jie Tang

    Abstract: Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion, seriously endangering social security, making their detection a critical concern. Recently, graph-based bot detection methods have achieved state-of-the-art (SOTA) performance. However, our research finds many isolated and poorly linked nodes in social networks, as shown in Fig.1,… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  10. arXiv:2406.06393  [pdf, other

    cs.CV cs.CL q-bio.GN

    STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

    Authors: Jiawen Chen, Muqing Zhou, Wenrong Wu, **wei Zhang, Yun Li, Didong Li

    Abstract: Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology ima… ▽ More

    Submitted 20 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    ACM Class: I.4.10; I.2.10

  11. arXiv:2406.06382  [pdf, other

    cs.CV cs.CL cs.LG

    Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

    Authors: Yi Gu, Zhendong Wang, Yueqin Yin, Yujia Xie, Mingyuan Zhou

    Abstract: Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory. The Diffusion-DPO technique made initial strides by employing pairwise preference learning in diffusion models tailored for specific text prompts. We introduce Di… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  12. arXiv:2406.05596  [pdf, other

    cs.CV cs.LG

    Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

    Authors: Yunhe Gao, Difei Gu, Mu Zhou, Dimitris Metaxas

    Abstract: Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective frame… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  13. arXiv:2406.05354  [pdf, other

    cs.AR cs.AI cs.DC

    Investigating Memory Failure Prediction Across CPU Architectures

    Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Industry Track

  14. arXiv:2406.01813  [pdf, other

    stat.ML cs.AI cs.LG stat.AP stat.ME

    Diffusion Boosted Trees

    Authors: Xizewen Han, Mingyuan Zhou

    Abstract: Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorit… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.01766  [pdf, ps, other

    cs.LG stat.ML

    How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

    Authors: Mo Zhou, Rong Ge

    Abstract: The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learnin… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  16. arXiv:2406.01561  [pdf, other

    cs.CV cs.AI cs.CL cs.LG stat.ML

    Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

    Authors: Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

    Abstract: Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by develo**… ▽ More

    Submitted 22 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  17. arXiv:2405.20830  [pdf, other

    cs.CL cs.LG

    Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment

    Authors: Yueqin Yin, Zhendong Wang, Yujia Xie, Weizhu Chen, Mingyuan Zhou

    Abstract: Traditional language model alignment methods, such as Direct Preference Optimization (DPO), are limited by their dependence on static, pre-collected paired preference data, which hampers their adaptability and practical applicability. To overcome this limitation, we introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  18. arXiv:2405.19690  [pdf, other

    cs.LG cs.AI

    Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

    Authors: Tianyu Chen, Zhendong Wang, Mingyuan Zhou

    Abstract: Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tri… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  19. arXiv:2405.16880  [pdf, other

    cs.SE

    Systematic Literature Review of Commercial Participation in Open Source Software

    Authors: Xuetao Li, Yuxia Zhang, Cailean Osborne, Minghui Zhou, Zhi **, Hui Liu

    Abstract: Open source software (OSS) has been playing a fundamental role in not only information technology but also our social lives. Attracted by various advantages of OSS, increasing commercial companies take extensive participation in open source development and have had a broad impact. This paper provides a comprehensive systematic literature review (SLR) of existing research on company participation i… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  20. arXiv:2405.16234  [pdf, other

    cs.CV

    Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

    Authors: Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang

    Abstract: This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs b… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  21. arXiv:2405.11280  [pdf, other

    cs.LG

    Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

    Authors: Marianne Arriola, Weishen Pan, Manqi Zhou, Qiannan Zhang, Chang Su, Fei Wang

    Abstract: Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel f… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 5 tables

  22. arXiv:2405.08233  [pdf

    cs.LG

    A Deep Dive Into the Factors Influencing Financial Success: A Machine Learning Approach

    Authors: Michael Zhou, Ramin Ramezani

    Abstract: This paper explores various socioeconomic factors that contribute to individual financial success using machine learning algorithms and approaches. Financial success, a critical aspect of all individual's well-being, is a complex concept influenced by a plethora of different factors. This study aims to understand the true determinants of financial success. It examines the survey data from the Nati… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 21 pages, 4 figures, 10 tables

  23. arXiv:2405.07508  [pdf, other

    cs.SE

    Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects

    Authors: Runzhi He, Hengzhi Ye, Minghui Zhou

    Abstract: Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  24. arXiv:2405.06203  [pdf, other

    cs.AI

    A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments

    Authors: Joyce Fonteles, Eduardo Davalos, Ashwin T. S., Yike Zhang, Mengxi Zhou, Efrat Ayalon, Alicia Lane, Selena Steinberg, Gabriella Anton, Joshua Danish, Noel Enyedy, Gautam Biswas

    Abstract: Investigating children's embodied learning in mixed-reality environments, where they collaboratively simulate scientific processes, requires analyzing complex multimodal data to interpret their learning and coordination behaviors. Learning scientists have developed Interaction Analysis (IA) methodologies for analyzing such data, but this requires researchers to watch hours of videos to extract and… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  25. arXiv:2405.04513  [pdf, other

    cs.CL cs.AI cs.LG

    Switchable Decision: Dynamic Neural Generation Networks

    Authors: Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

    Abstract: Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  26. arXiv:2405.00522  [pdf, other

    econ.GN cs.CE cs.CL cs.CR q-fin.CP

    DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting

    Authors: Yihang Fu, Mingyu Zhou, Luyao Zhang

    Abstract: In the distributed systems landscape, Blockchain has catalyzed the rise of cryptocurrencies, merging enhanced security and decentralization with significant investment opportunities. Despite their potential, current research on cryptocurrency trend forecasting often falls short by simplistically merging sentiment data without fully considering the nuanced interplay between financial market dynamic… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  27. arXiv:2404.18202  [pdf, other

    cs.AI cs.MM

    WorldGPT: Empowering LLM as Multimodal World Model

    Authors: Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

    Abstract: World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). W… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  28. arXiv:2404.16565  [pdf, other

    cs.SE

    PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages

    Authors: Kai Gao, Weiwei Xu, Wenhao Yang, Minghui Zhou

    Abstract: A package's source code repository records the development history of the package, providing indispensable information for the use and risk monitoring of the package. However, a package release often misses its source code repository due to the separation of the package's development platform from its distribution platform. Existing tools retrieve the release's repository information from its meta… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted at FSE 2024

  29. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  30. arXiv:2404.14768  [pdf, other

    cs.CV

    Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

    Authors: Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng

    Abstract: Recently, integrating visual controls into text-to-image~(T2I) models, such as ControlNet method, has received significant attention for finer control capabilities. While various training-free methods make efforts to enhance prompt following in T2I models, the issue with visual control is still rarely studied, especially in the scenario that visual controls are misaligned with text prompts. In thi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  31. arXiv:2404.13984  [pdf, other

    cs.CV

    RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

    Authors: Chengrui Wang, Pengfei Liu, Min Zhou, Ming Zeng, Xubin Li, Tiezheng Ge, Bo zheng

    Abstract: Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. Some previous works mitigate the problem by considering hand structure yet struggle to maintain style consistency between refined malformed hands and other image regions. In this paper, we aim to solve the problem of inconsistency regardin… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  32. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  33. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  34. PrintListener: Uncovering the Vulnerability of Fingerprint Authentication via the Finger Friction Sound

    Authors: Man Zhou, Shuao Su, Qian Wang, Qi Li, Yuting Zhou, Xiao**g Ma, Zhengxiong Li

    Abstract: Fingerprint authentication has been extensively employed in contemporary identity verification systems owing to its rapidity and cost-effectiveness. Due to its widespread use, fingerprint leakage may cause sensitive information theft, enormous economic and personnel losses, and even a potential compromise of national security. As a fingerprint that can coincidentally match a specific proportion of… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: in Proc. of NDSS, 2024

  35. arXiv:2404.07976  [pdf, other

    cs.CV cs.AI

    Self-supervised Dataset Distillation: A Good Compression Is All You Need

    Authors: Muxin Zhou, Zeyuan Yin, Shitong Shao, Zhiqiang Shen

    Abstract: Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  36. arXiv:2404.06769  [pdf

    cs.NE

    Solving the Food-Energy-Water Nexus Problem via Intelligent Optimization Algorithms

    Authors: Qi Deng, Zheng Fan, Zhi Li, Xinna Pan, Qi Kang, MengChu Zhou

    Abstract: The application of evolutionary algorithms (EAs) to multi-objective optimization problems has been widespread. However, the EA research community has not paid much attention to large-scale multi-objective optimization problems arising from real-world applications. Especially, Food-Energy-Water systems are intricately linked among food, energy and water that impact each other. They usually involve… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  37. arXiv:2404.04545  [pdf, other

    cs.MM cs.CL

    TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

    Authors: Ming Zhou, Weize Quan, Ziqi Zhou, Kai Wang, Tong Wang, Dong-Ming Yan

    Abstract: Multimodal Sentiment Analysis (MSA) endeavors to understand human sentiment by leveraging language, visual, and acoustic modalities. Despite the remarkable performance exhibited by previous MSA approaches, the presence of inherent multimodal heterogeneities poses a challenge, with the contribution of different modalities varying considerably. Past research predominantly focused on improving repres… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  38. arXiv:2404.04057  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

    Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

    Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More

    Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

  39. arXiv:2403.16479  [pdf, other

    cs.SE

    Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

    Authors: Mingyi Zhou, Xiang Gao, Pei Liu, John Grundy, Chunyang Chen, Xiao Chen, Li Li

    Abstract: Recent studies show that deployed deep learning (DL) models such as those of Tensor Flow Lite (TFLite) can be easily extracted from real-world applications and devices by attackers to generate many kinds of attacks like adversarial attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent the aforementioned threats. Traditional s… ▽ More

    Submitted 31 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA2024)

  40. arXiv:2403.15698  [pdf, other

    cs.CV cs.AI

    SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

    Authors: Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

    Abstract: Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  41. arXiv:2403.15483  [pdf

    eess.SP cs.LG

    Rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model

    Authors: Maoxuan Zhou, Wei Kang, Kun He

    Abstract: In order to solve the problem that current convolutional neural networks can not capture the correlation features between the time domain signals of rolling bearings effectively, and the model accuracy is limited by the number and quality of samples, a rolling bearing fault diagnosis method based on generative adversarial enhanced multi-scale convolutional neural network model is proposed. Firstly… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  42. arXiv:2403.13583  [pdf, other

    cs.SE cs.CL cs.LG

    CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing

    Authors: Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian Yuan, Dongmei Zhang

    Abstract: Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CoCoST framework, which e… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  43. arXiv:2403.12027  [pdf, other

    cs.CL cs.AI cs.CV

    From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

    Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

    Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More

    Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  44. arXiv:2403.05567  [pdf, other

    cs.HC

    A Unified Framework for Underwater Metaverse with Optical Perception

    Authors: **gyang Cao, Mu Zhou, Jiacheng Wang, Guangyuan Liu, Dusit Niyato, Shiwen Mao, Zhu Han, Jiawen Kang

    Abstract: With the advancement of AI technology and increasing attention to deep-sea exploration, the underwater Metaverse is gradually emerging. This paper explores the concept of underwater Metaverse, emerging virtual reality systems and services aimed at simulating and enhancing virtual experience of marine environments. First, we discuss potential applications of underwater Metaverse in underwater scien… ▽ More

    Submitted 20 February, 2024; originally announced March 2024.

  45. arXiv:2403.05063  [pdf, other

    cs.IR cs.AI

    Aligning Large Language Models for Controllable Recommendations

    Authors: Wensheng Lu, Jianxun Lian, Wei Zhang, Guanghua Li, Mingyang Zhou, Hao Liao, Xing Xie

    Abstract: Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting th… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 13 pages

    MSC Class: 68T50

  46. arXiv:2403.04918  [pdf, other

    cs.CR

    Secure Information Embedding and Extraction in Forensic 3D Fingerprinting

    Authors: Canran Wang, **wen Wang, Mi Zhou, Vinh Pham, Senyue Hao, Chao Zhou, Ning Zhang, Netanel Raviv

    Abstract: The prevalence of 3D printing poses a significant risk to public safety, as any individual with internet access and a commodity printer is able to produce untraceable firearms, keys, counterfeit products, etc. To aid government authorities in combating these new security threats, several approaches have been taken to tag 3D-prints with identifying information. Known as fingerprints, this informati… ▽ More

    Submitted 12 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  47. arXiv:2403.02726  [pdf

    econ.GN cs.AI cs.CY

    Bias in Generative AI

    Authors: Mi Zhou, Vibhanshu Abhishek, Timothy Derdenger, Jaymo Kim, Kannan Srinivasan

    Abstract: This study analyzed images generated by three popular generative artificial intelligence (AI) tools - Midjourney, Stable Diffusion, and DALLE 2 - representing various occupations to investigate potential bias in AI generators. Our analysis revealed two overarching areas of concern in these AI generators, including (1) systematic gender and racial biases, and (2) subtle biases in facial expressions… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  48. arXiv:2403.00987  [pdf, other

    cs.MA cs.RO eess.SY

    Composite Distributed Learning and Synchronization of Nonlinear Multi-Agent Systems with Complete Uncertain Dynamics

    Authors: Emadodin Jandaghi, Dalton L. Stein, Adam Hoburg, Paolo Stegagno, Mingxi Zhou, Chengzhi Yuan

    Abstract: This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized determinist… ▽ More

    Submitted 9 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  49. arXiv:2402.17207  [pdf, other

    cs.CV

    Deployment Prior Injection for Run-time Calibratable Object Detection

    Authors: Mo Zhou, Yiding Yang, Haoxiang Li, Vishal M. Patel, Gang Hua

    Abstract: With a strong alignment between the training and test distributions, object relation as a context prior facilitates object detection. Yet, it turns into a harmful but inevitable training set bias upon test distributions that shift differently across space and time. Nevertheless, the existing detectors cannot incorporate deployment context prior during the test phase without parameter update. Such… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  50. arXiv:2402.14270  [pdf, other

    cs.LG

    Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization

    Authors: Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen, Yingbin Liang, Mingyuan Zhou, Zhangyang Wang

    Abstract: In the rapidly advancing arena of large language models (LLMs), a key challenge is to enhance their capabilities amid a looming shortage of high-quality training data. Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses. These sampl… ▽ More

    Submitted 1 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint; updated reference and related works