Skip to main content

Showing 1–50 of 1,131 results for author: Zhou, M

.
  1. arXiv:2407.03205  [pdf, other

    cs.CV

    Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

    Authors: Mingkui Feng, Hancheng Yu, Xiaoyu Dang, Ming Zhou

    Abstract: Objects in aerial images are typically embedded in complex backgrounds and exhibit arbitrary orientations. When employing oriented bounding boxes (OBB) to represent arbitrary oriented objects, the periodicity of angles could lead to discontinuities in label regression values at the boundaries, inducing abrupt fluctuations in the loss function. To address this problem, an OBB representation based o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  2. arXiv:2407.03125  [pdf, other

    cs.LG cs.AI

    Foundations and Frontiers of Graph Learning Theory

    Authors: Yu Huang, Min Zhou, Menglin Yang, Zhen Wang, Muhan Zhang, Jie Wang, Hong Xie, Hao Wang, Defu Lian, Enhong Chen

    Abstract: Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures. Notably, Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm. With these models being usually characterized by intuition-driven design or highly intricate components, placing them within the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 36pages,273references

  3. arXiv:2407.02283  [pdf, other

    cs.CV cs.AI

    A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling

    Authors: Minghao Zhou, Hong Wang, Yefeng Zheng, Deyu Meng

    Abstract: Feature upsampling is a fundamental and indispensable ingredient of almost all current network structures for image segmentation tasks. Recently, a popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance to help upsample the low-resolution deep feature based on their local similarity. Albeit achieving promising performance, this… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/zmhhmz/ReSFU

  4. arXiv:2407.02211  [pdf, other

    cs.CL cs.AI cs.LG

    PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning

    Authors: Jiaru Zou, Mengyu Zhou, Tao Li, Shi Han, Dongmei Zhang

    Abstract: Large language models (LLMs) have played a fundamental role in various natural language processing tasks with powerful prompt techniques. However, in real-world applications, there are often similar prompt components for repeated queries, which causes significant computational burdens during inference. Existing prompt compression and direct fine-tuning methods aim to tackle these challenges, yet t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  5. arXiv:2407.02118  [pdf, other

    cs.CL

    Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale

    Authors: Wenzhen Zheng, Wenbo Pan, Xu Xu, Libo Qin, Li Yue, Ming Zhou

    Abstract: In recent years, Large Language Models (LLMs) have made significant strides towards Artificial General Intelligence. However, training these models from scratch requires substantial computational resources and vast amounts of text data. In this paper, we explore an alternative approach to constructing an LLM for a new language by continually pretraining (CPT) from existing pretrained LLMs, instead… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages

  6. arXiv:2407.00604  [pdf, other

    cs.AR

    Fast-OverlaPIM: A Fast Overlap-driven Map** Framework for Processing In-Memory Neural Network Acceleration

    Authors: Xuan Wang, Minxuan Zhou, Tajana Rosing

    Abstract: Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application map**, which determines the operation scheduling and data layout, plays a critical role in the NN acceleration on PIM. The map** optimization of previous NN accelerators focused on opti… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: This work is accepted by IEEE TCAD

  7. arXiv:2406.17591  [pdf, other

    cs.CV

    DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation

    Authors: Ahmad Mohammadshirazi, Ali Nosrati Firoozsalari, Mengxi Zhou, Dheeraj Kulshrestha, Rajiv Ramnath

    Abstract: Automating the annotation of scanned documents is challenging, requiring a balance between computational efficiency and accuracy. DocParseNet addresses this by combining deep learning and multi-modal learning to process both text and visual data. This model goes beyond traditional OCR and semantic segmentation, capturing the interplay between text and images to preserve contextual nuances in compl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.12014  [pdf, other

    astro-ph.HE

    An IXPE-Led X-ray Spectro-Polarimetric Campaign on the Soft State of Cygnus X-1: X-ray Polarimetric Evidence for Strong Gravitational Lensing

    Authors: James F. Steiner, Edward Nathan, Kun Hu, Henric Krawczynski, Michal Dovciak, Alexandra Veledina, Fabio Muleri, Jiri Svoboda, Kevin Alabarta, Maxime Parra, Yash Bhargava, Giorgio Matt, Juri Poutanen, Pierre-Olivier Petrucci, Allyn F. Tennant, M. Cristina Baglio, Luca Baldini, Samuel Barnier, Sudip Bhattacharyya, Stefano Bianchi, Maimouna Brigitte, Mauricio Cabezas, Floriane Cangemi, Fiamma Capitanio, Jacob Casey , et al. (112 additional authors not shown)

    Abstract: We present the first X-ray spectropolarimetric results for Cygnus X-1 in its soft state from a campaign of five IXPE observations conducted during 2023 May-June. Companion multiwavelength data during the campaign are likewise shown. The 2-8 keV X-rays exhibit a net polarization degree PD=1.99%+/-0.13% (68% confidence). The polarization signal is found to increase with energy across IXPE's 2-8 keV… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 20 pages, accepted for publication in ApJL

  9. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  10. arXiv:2406.10797  [pdf, other

    cs.CV

    STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

    Authors: Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Huaian Chen, Yi **

    Abstract: We present STAR, a text-to-image model that employs scale-wise auto-regressive paradigm. Unlike VAR, which is limited to class-conditioned synthesis within a fixed set of predetermined categories, our STAR enables text-driven open-set generation through three key designs: To boost diversity and generalizability with unseen combinations of objects and concepts, we introduce a pre-trained text encod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  11. arXiv:2406.09357  [pdf, other

    cs.LG stat.ML

    Advancing Graph Generation through Beta Diffusion

    Authors: Yilin He, Xinyang Liu, Bo Chen, Mingyuan Zhou

    Abstract: Diffusion models have demonstrated effectiveness in generating natural images and have been extended to generate diverse data types, including graphs. This new generation of diffusion-based graph generative models has demonstrated significant performance improvements over methods that rely on variational autoencoders or generative adversarial networks. It's important to recognize, however, that mo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  12. arXiv:2406.08762  [pdf, other

    cs.SI cs.CY

    LGB: Language Model and Graph Neural Network-Driven Social Bot Detection

    Authors: Ming Zhou, Dan Zhang, Yuandong Wang, Yangli-ao Geng, Yuxiao Dong, Jie Tang

    Abstract: Malicious social bots achieve their malicious purposes by spreading misinformation and inciting social public opinion, seriously endangering social security, making their detection a critical concern. Recently, graph-based bot detection methods have achieved state-of-the-art (SOTA) performance. However, our research finds many isolated and poorly linked nodes in social networks, as shown in Fig.1,… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  14. arXiv:2406.07201  [pdf, ps, other

    math.AP

    Exact blow-up profiles for the parabolic-elliptic Keller-Segel system in dimensions $N\ge 3$

    Authors: Xueli Bai, Maolin Zhou

    Abstract: In this paper, we obtain the exact blow-up profiles of solutions of the Keller-Segel-Patlak system in the space with dimensions $N\ge 3$, which solves an open problem proposed by P. Souplet and M. Winkler in 2019. To establish this achievement, we develop the zero number argument for nonlinear equations with unbounded coefficients and construct a family of auxiliary backward self-similar solutions… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.06393  [pdf, other

    cs.CV cs.CL q-bio.GN

    STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

    Authors: Jiawen Chen, Muqing Zhou, Wenrong Wu, **wei Zhang, Yun Li, Didong Li

    Abstract: Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology ima… ▽ More

    Submitted 20 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    ACM Class: I.4.10; I.2.10

  16. arXiv:2406.06382  [pdf, other

    cs.CV cs.CL cs.LG

    Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

    Authors: Yi Gu, Zhendong Wang, Yueqin Yin, Yujia Xie, Mingyuan Zhou

    Abstract: Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory. The Diffusion-DPO technique made initial strides by employing pairwise preference learning in diffusion models tailored for specific text prompts. We introduce Di… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  17. arXiv:2406.05596  [pdf, other

    cs.CV cs.LG

    Aligning Human Knowledge with Visual Concepts Towards Explainable Medical Image Classification

    Authors: Yunhe Gao, Difei Gu, Mu Zhou, Dimitris Metaxas

    Abstract: Although explainability is essential in the clinical diagnosis, most deep learning models still function as black boxes without elucidating their decision-making process. In this study, we investigate the explainable model development that can mimic the decision-making process of human experts by fusing the domain knowledge of explicit diagnostic criteria. We introduce a simple yet effective frame… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024 Early Accept

  18. arXiv:2406.05354  [pdf, other

    cs.AR cs.AI cs.DC

    Investigating Memory Failure Prediction Across CPU Architectures

    Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Industry Track

  19. arXiv:2406.01813  [pdf, other

    stat.ML cs.AI cs.LG stat.AP stat.ME

    Diffusion Boosted Trees

    Authors: Xizewen Han, Mingyuan Zhou

    Abstract: Combining the merits of both denoising diffusion probabilistic models and gradient boosting, the diffusion boosting paradigm is introduced for tackling supervised learning problems. We develop Diffusion Boosted Trees (DBT), which can be viewed as both a new denoising diffusion generative model parameterized by decision trees (one single tree for each diffusion timestep), and a new boosting algorit… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2406.01766  [pdf, ps, other

    cs.LG stat.ML

    How Does Gradient Descent Learn Features -- A Local Analysis for Regularized Two-Layer Neural Networks

    Authors: Mo Zhou, Rong Ge

    Abstract: The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learnin… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.01561  [pdf, other

    cs.CV cs.AI cs.CL cs.LG stat.ML

    Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

    Authors: Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

    Abstract: Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by develo**… ▽ More

    Submitted 22 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2405.20830  [pdf, other

    cs.CL cs.LG

    Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment

    Authors: Yueqin Yin, Zhendong Wang, Yujia Xie, Weizhu Chen, Mingyuan Zhou

    Abstract: Traditional language model alignment methods, such as Direct Preference Optimization (DPO), are limited by their dependence on static, pre-collected paired preference data, which hampers their adaptability and practical applicability. To overcome this limitation, we introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  23. arXiv:2405.19690  [pdf, other

    cs.LG cs.AI

    Diffusion Policies creating a Trust Region for Offline Reinforcement Learning

    Authors: Tianyu Chen, Zhendong Wang, Mingyuan Zhou

    Abstract: Offline reinforcement learning (RL) leverages pre-collected datasets to train optimal policies. Diffusion Q-Learning (DQL), introducing diffusion models as a powerful and expressive policy class, significantly boosts the performance of offline RL. However, its reliance on iterative denoising sampling to generate actions slows down both training and inference. While several recent attempts have tri… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  24. arXiv:2405.16880  [pdf, other

    cs.SE

    Systematic Literature Review of Commercial Participation in Open Source Software

    Authors: Xuetao Li, Yuxia Zhang, Cailean Osborne, Minghui Zhou, Zhi **, Hui Liu

    Abstract: Open source software (OSS) has been playing a fundamental role in not only information technology but also our social lives. Attracted by various advantages of OSS, increasing commercial companies take extensive participation in open source development and have had a broad impact. This paper provides a comprehensive systematic literature review (SLR) of existing research on company participation i… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  25. arXiv:2405.16234  [pdf, other

    cs.CV

    Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

    Authors: Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang

    Abstract: This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs b… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  26. arXiv:2405.11826  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    Data quality control system and long-term performance monitor of the LHAASO-KM2A

    Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

    Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More

    Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 15 pages, 9 figures

  27. arXiv:2405.11280  [pdf, other

    cs.LG

    Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities

    Authors: Marianne Arriola, Weishen Pan, Manqi Zhou, Qiannan Zhang, Chang Su, Fei Wang

    Abstract: Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel f… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 10 pages, 7 figures, 5 tables

  28. arXiv:2405.09031  [pdf, other

    math.AP

    Principal eigenvalue for some elliptic operators with large drift: Neumann boundary conditions

    Authors: Shuang Liu, Yuan Lou, Maolin Zhou

    Abstract: The paper is concerned with the principal eigenvalue of some linear elliptic operators with drift in two dimensional space. We provide a refined description of the asymptotic behavior for the principal eigenvalue as the drift rate approaches infinity. Under some non-degeneracy assumptions, our results illustrate that these asymptotic behaviors are completely determined by some connected components… ▽ More

    Submitted 15 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: 53 pages, 9 figures

    MSC Class: 35P15; 35P20; 34C25

  29. arXiv:2405.08233  [pdf

    cs.LG

    A Deep Dive Into the Factors Influencing Financial Success: A Machine Learning Approach

    Authors: Michael Zhou, Ramin Ramezani

    Abstract: This paper explores various socioeconomic factors that contribute to individual financial success using machine learning algorithms and approaches. Financial success, a critical aspect of all individual's well-being, is a complex concept influenced by a plethora of different factors. This study aims to understand the true determinants of financial success. It examines the survey data from the Nati… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 21 pages, 4 figures, 10 tables

  30. arXiv:2405.07691  [pdf, other

    astro-ph.HE

    Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  31. arXiv:2405.07508  [pdf, other

    cs.SE

    Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects

    Authors: Runzhi He, Hengzhi Ye, Minghui Zhou

    Abstract: Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  32. arXiv:2405.06203  [pdf, other

    cs.AI

    A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments

    Authors: Joyce Fonteles, Eduardo Davalos, Ashwin T. S., Yike Zhang, Mengxi Zhou, Efrat Ayalon, Alicia Lane, Selena Steinberg, Gabriella Anton, Joshua Danish, Noel Enyedy, Gautam Biswas

    Abstract: Investigating children's embodied learning in mixed-reality environments, where they collaboratively simulate scientific processes, requires analyzing complex multimodal data to interpret their learning and coordination behaviors. Learning scientists have developed Interaction Analysis (IA) methodologies for analyzing such data, but this requires researchers to watch hours of videos to extract and… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  33. arXiv:2405.04513  [pdf, other

    cs.CL cs.AI cs.LG

    Switchable Decision: Dynamic Neural Generation Networks

    Authors: Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

    Abstract: Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications. However, they are also known for being slow in inference, which makes them challenging to deploy in real-time applications. We propose a switchable decision to accelerate inference by dynamically assigning computation resources for each d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  34. arXiv:2405.00522  [pdf, other

    econ.GN cs.CE cs.CL cs.CR q-fin.CP

    DAM: A Universal Dual Attention Mechanism for Multimodal Timeseries Cryptocurrency Trend Forecasting

    Authors: Yihang Fu, Mingyu Zhou, Luyao Zhang

    Abstract: In the distributed systems landscape, Blockchain has catalyzed the rise of cryptocurrencies, merging enhanced security and decentralization with significant investment opportunities. Despite their potential, current research on cryptocurrency trend forecasting often falls short by simplistically merging sentiment data without fully considering the nuanced interplay between financial market dynamic… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  35. arXiv:2404.18202  [pdf, other

    cs.AI cs.MM

    WorldGPT: Empowering LLM as Multimodal World Model

    Authors: Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

    Abstract: World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). W… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  36. arXiv:2404.16565  [pdf, other

    cs.SE

    PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages

    Authors: Kai Gao, Weiwei Xu, Wenhao Yang, Minghui Zhou

    Abstract: A package's source code repository records the development history of the package, providing indispensable information for the use and risk monitoring of the package. However, a package release often misses its source code repository due to the separation of the package's development platform from its distribution platform. Existing tools retrieve the release's repository information from its meta… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted at FSE 2024

  37. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, **shan Pan, Jiangxin Dong, **hui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi **, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  38. arXiv:2404.14768  [pdf, other

    cs.CV

    Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion

    Authors: Hongyu Chen, Yiqi Gao, Min Zhou, Peng Wang, Xubin Li, Tiezheng Ge, Bo Zheng

    Abstract: Recently, integrating visual controls into text-to-image~(T2I) models, such as ControlNet method, has received significant attention for finer control capabilities. While various training-free methods make efforts to enhance prompt following in T2I models, the issue with visual control is still rarely studied, especially in the scenario that visual controls are misaligned with text prompts. In thi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  39. arXiv:2404.13984  [pdf, other

    cs.CV

    RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance

    Authors: Chengrui Wang, Pengfei Liu, Min Zhou, Ming Zeng, Xubin Li, Tiezheng Ge, Bo zheng

    Abstract: Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. Some previous works mitigate the problem by considering hand structure yet struggle to maintain style consistency between refined malformed hands and other image regions. In this paper, we aim to solve the problem of inconsistency regardin… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  40. arXiv:2404.12804  [pdf, other

    cs.CV eess.IV

    Linearly-evolved Transformer for Pan-sharpening

    Authors: Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Man Zhou, Danfeng Hong

    Abstract: Vision transformer family has dominated the satellite pan-sharpening field driven by the global-wise spatial information modeling mechanism from the core self-attention ingredient. The standard modeling rules within these promising pan-sharpening methods are to roughly stack the transformer variants in a cascaded manner. Despite the remarkable advancement, their success may be at the huge cost of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages

  41. The 2018 outburst of MAXI J1820+070 as seen by Insight-HXMT

    Authors: Ningyue Fan, Songyu Li, Rui Zhan, Honghui Liu, Zuobin Zhang, Cosimo Bambi, Long Ji, Xiang Ma, James F. Steiner, Shuang-Nan Zhang, Menglei Zhou

    Abstract: We present an analysis of the whole 2018 outburst of the black hole X-ray binary MAXI J1820+070 with Insight-HXMT data. We focus our study on the temporal evolution of the parameters of the source. We employ two different models to fit the disk's thermal spectrum: the Newtonian model DISKBB and the relativistic model NKBB. These two models provide different pictures of the source in the soft state… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 14 pages, 8 figures. v2: refereed version

    Journal ref: Astrophys.J. 969: 61 (2024)

  42. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  43. PrintListener: Uncovering the Vulnerability of Fingerprint Authentication via the Finger Friction Sound

    Authors: Man Zhou, Shuao Su, Qian Wang, Qi Li, Yuting Zhou, Xiao**g Ma, Zhengxiong Li

    Abstract: Fingerprint authentication has been extensively employed in contemporary identity verification systems owing to its rapidity and cost-effectiveness. Due to its widespread use, fingerprint leakage may cause sensitive information theft, enormous economic and personnel losses, and even a potential compromise of national security. As a fingerprint that can coincidentally match a specific proportion of… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: in Proc. of NDSS, 2024

  44. arXiv:2404.07976  [pdf, other

    cs.CV cs.AI

    Self-supervised Dataset Distillation: A Good Compression Is All You Need

    Authors: Muxin Zhou, Zeyuan Yin, Shitong Shao, Zhiqiang Shen

    Abstract: Dataset distillation aims to compress information from a large-scale original dataset to a new compact dataset while striving to preserve the utmost degree of the original data informational essence. Previous studies have predominantly concentrated on aligning the intermediate statistics between the original and distilled data, such as weight trajectory, features, gradient, BatchNorm, etc. In this… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  45. arXiv:2404.06769  [pdf

    cs.NE

    Solving the Food-Energy-Water Nexus Problem via Intelligent Optimization Algorithms

    Authors: Qi Deng, Zheng Fan, Zhi Li, Xinna Pan, Qi Kang, MengChu Zhou

    Abstract: The application of evolutionary algorithms (EAs) to multi-objective optimization problems has been widespread. However, the EA research community has not paid much attention to large-scale multi-objective optimization problems arising from real-world applications. Especially, Food-Energy-Water systems are intricately linked among food, energy and water that impact each other. They usually involve… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  46. arXiv:2404.04801  [pdf, ps, other

    astro-ph.IM astro-ph.HE

    LHAASO-KM2A detector simulation using Geant4

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

    Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  47. arXiv:2404.04545  [pdf, other

    cs.MM cs.CL

    TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

    Authors: Ming Zhou, Weize Quan, Ziqi Zhou, Kai Wang, Tong Wang, Dong-Ming Yan

    Abstract: Multimodal Sentiment Analysis (MSA) endeavors to understand human sentiment by leveraging language, visual, and acoustic modalities. Despite the remarkable performance exhibited by previous MSA approaches, the presence of inherent multimodal heterogeneities poses a challenge, with the contribution of different modalities varying considerably. Past research predominantly focused on improving repres… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  48. arXiv:2404.04057  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

    Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

    Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More

    Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

  49. arXiv:2404.00109  [pdf, other

    stat.AP

    Reverse stress testing via multivariate modeling with vine copulas

    Authors: Menglin Zhou, Natalia Nolde

    Abstract: As an important tool in financial risk management, stress testing aims to evaluate the stability of financial portfolios under some potential large shocks from extreme yet plausible scenarios of risk factors. The effectiveness of a stress test crucially depends on the choice of stress scenarios. In this paper we consider a pragmatic approach to stress scenario estimation that aims to address sever… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: 35 pages, 23 figures

  50. arXiv:2403.16479  [pdf, other

    cs.SE

    Model-less Is the Best Model: Generating Pure Code Implementations to Replace On-Device DL Models

    Authors: Mingyi Zhou, Xiang Gao, Pei Liu, John Grundy, Chunyang Chen, Xiao Chen, Li Li

    Abstract: Recent studies show that deployed deep learning (DL) models such as those of Tensor Flow Lite (TFLite) can be easily extracted from real-world applications and devices by attackers to generate many kinds of attacks like adversarial attacks. Although securing deployed on-device DL models has gained increasing attention, no existing methods can fully prevent the aforementioned threats. Traditional s… ▽ More

    Submitted 31 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA2024)