Skip to main content

Showing 1–50 of 151 results for author: Cui, M

.
  1. arXiv:2407.02990  [pdf, other

    cs.CV

    Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation

    Authors: Mengmeng Cui, Kunbo Zhang, Zhenan Sun

    Abstract: In recent years, 2D-to-3D pose uplifting in monocular 3D Human Pose Estimation (HPE) has attracted widespread research interest. GNN-based methods and Transformer-based methods have become mainstream architectures due to their advanced spatial and temporal feature learning capacities. However, existing approaches typically construct joint-wise and frame-wise attention alignments in spatial and tem… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  2. arXiv:2406.17800  [pdf, other

    q-bio.QM cs.SD eess.AS

    Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

    Authors: Meng Cui, Xubo Liu, Haohe Liu, **zheng Zhao, Daoliang Li, Wenwu Wang

    Abstract: Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which are essential for optimizing production efficiency, enhancing fish welfare, and improving resource management. Previous reviews have focused on single… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.12306  [pdf, other

    physics.class-ph quant-ph

    Interacting Mathieu equation, synchronization dynamics and collision-induced velocity exchange in trapped ions

    Authors: Asma Benbouza, Xiaoshui Lin, ** Ming Cui, Ming Gong

    Abstract: Recently, large-scale trapped ion systems have been realized in experiments for quantum simulation and quantum computation. They are the simplest systems for dynamical stability and parametric resonance. In this model, the Mathieu equation plays the most fundamental role for us to understand the stability and instability of a single ion. In this work, we investigate the dynamics of trapped ions wi… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 15 pages, 14 figures

  4. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, **peng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  5. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui **, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  6. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui **g, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  7. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  8. arXiv:2406.07989  [pdf, other

    cs.IT eess.SP

    Near-Field Wideband Beam Training Based on Distance-Dependent Beam Split

    Authors: Tianyue Zheng, Mingyao Cui, Zidong Wu, Linglong Dai

    Abstract: Near-field beam training is essential for acquiring channel state information in 6G extremely large-scale multiple input multiple output (XL-MIMO) systems. To achieve low-overhead beam training, existing method has been proposed to leverage the near-field beam split effect, which deploys true-time-delay arrays to simultaneously search multiple angles of the entire angular range in a distance ring… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  9. arXiv:2406.07081  [pdf, other

    cs.CL

    Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning

    Authors: Menglong Cui, Jiangcun Du, Shaolin Zhu, Deyi Xiong

    Abstract: Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL2024 long paper (Findings)

  10. arXiv:2406.05634  [pdf, ps, other

    nlin.SI

    The non-Abelian two-dimensional Toda lattice and matrix sine-Gordon equations with self-consistent sources

    Authors: Mengyuan Cui, Chunxia Li

    Abstract: The non-Abelian two-dimensional Toda lattice and matrix sine-Gordon equations with self-consistent sources are established and solved. Two families of quasideterminant solutions are presented for the non-Abelian two-dimensional Toda lattice with self-consistent sources. By employing periodic and quasi-periodic reductions, a matrix sine-Gordon equation with self-consistent sources is constructed fo… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  11. arXiv:2406.02230  [pdf, other

    cs.CV

    I4VGen: Image as Step** Stone for Text-to-Video Generation

    Authors: Xiefan Guo, **lin Liu, Miaomiao Cui, Di Huang

    Abstract: Text-to-video generation has lagged behind text-to-image synthesis in quality and diversity due to the complexity of spatio-temporal modeling and limited video-text datasets. This paper presents I4VGen, a training-free and plug-and-play video diffusion inference framework, which enhances text-to-video generation by leveraging robust image techniques. Specifically, following text-to-image-to-video,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Project page: https://xiefan-guo.github.io/i4vgen

  12. arXiv:2405.16393  [pdf, other

    cs.CV cs.AI

    Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation

    Authors: **lin Liu, Kai Yu, Mengyang Feng, Xiefan Guo, Miaomiao Cui

    Abstract: Recent advancements in human video synthesis have enabled the generation of high-quality videos through the application of stable diffusion models. However, existing methods predominantly concentrate on animating solely the human element (the foreground) guided by pose information, while leaving the background entirely static. Contrary to this, in authentic, high-quality videos, backgrounds often… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  13. arXiv:2405.12850  [pdf, other

    cs.CV

    Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy

    Authors: Jjahao Zhang, Yin Gu, Deyu Sun, Yuhua Gao, Ming Gao, Ming Cui, Teng Zhang, He Ma

    Abstract: Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  14. arXiv:2405.12601  [pdf, other

    cs.CV

    FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors

    Authors: Shuai Liu, Boyang Li, Zhiyu Fang, Mingyue Cui, Kai Huang

    Abstract: LiDAR-based 3D object detection has made impressive progress recently, yet most existing models are black-box, lacking interpretability. Previous explanation approaches primarily focus on analyzing image-based models and are not readily applicable to LiDAR-based 3D detectors. In this paper, we propose a feature factorization activation map (FFAM) to generate high-quality visual explanations for 3D… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  15. arXiv:2405.11826  [pdf, other

    astro-ph.IM hep-ex physics.ins-det

    Data quality control system and long-term performance monitor of the LHAASO-KM2A

    Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

    Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More

    Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 15 pages, 9 figures

  16. arXiv:2405.07691  [pdf, other

    astro-ph.HE

    Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  17. arXiv:2404.14969  [pdf, ps, other

    nlin.SI

    The symmetric (2+1)-dimensional Lotka-Volterra equation with self-consistent sources

    Authors: Mengyuan Cui, Chunxia Li, Yuqin Yao

    Abstract: The symmetric (2+1)-dimensional Lotka-Volterra equation with self-consistent sources is constructed and solved by employing the source generation procedure, whose solutions are expressed in terms of pfaffians. As special cases of the pfaffian solutions, different types of explicit solutions are obtained, including dromions, soliton solutions and breather solutions.

    Submitted 23 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  18. arXiv:2404.06806  [pdf, other

    cs.IT eess.SP

    Near-Optimal Channel Estimation for Dense Array Systems

    Authors: Mingyao Cui, Zijian Zhang, Linglong Dai, Kaibin Huang

    Abstract: By deploying a large number of antennas with sub-half-wavelength spacing in a compact space, dense array systems(DASs) can fully unleash the multiplexing-and-diversity gains of limited apertures. To acquire these gains, accurate channel state information acquisition is necessary but challenging due to the large antenna numbers. To overcome this obstacle, this paper reveals that exploiting the high… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 19 pages, 10 figures

  19. arXiv:2404.05937  [pdf

    physics.med-ph

    De-aberration for transcranial photoacoustic computed tomography through an adult human skull

    Authors: Yousuf Aborahama, Karteekeya Sastry, Manxiu Cui, Yang Zhang, Yilin Luo, Rui Cao, Lihong V. Wang

    Abstract: Noninvasive transcranial photoacoustic computed tomography (PACT) of the human brain, despite its clinical potential, remains impeded by the acoustic distortion induced by the human skull. The distortion, which is attributed to the markedly different material properties of the skull relative to soft tissue, results in heavily aberrated PACT images -- a problem that has remained unsolved in the pas… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 11 pages, 3 figures

  20. arXiv:2404.04864  [pdf, other

    cs.IT

    Towards Atomic MIMO Receivers

    Authors: Mingyao Cui, Qunsong Zeng, Kaibin Huang

    Abstract: The advancement of Rydberg atoms in quantum sensing is driving a paradigm shift from classical receivers to atomic receivers. Capitalizing on the extreme sensitivity of Rydberg atoms to external disturbance, atomic receivers can measure radio-waves more precisely than classical receivers to support high-performance wireless communication and sensing. Although the atomic receiver is develo** rapi… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 13 pages, 8 figures. Submitted to IEEE for possible publication

  21. arXiv:2404.04801  [pdf, ps, other

    astro-ph.IM astro-ph.HE

    LHAASO-KM2A detector simulation using Geant4

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

    Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  22. arXiv:2404.04650  [pdf, other

    cs.CV

    InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

    Authors: Xiefan Guo, **lin Liu, Miaomiao Cui, Jiankai Li, Hongyu Yang, Di Huang

    Abstract: Recent strides in the development of diffusion models, exemplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visually compelling images. However, the imperative of achieving a seamless alignment between the generated image and the provided prompt persists as a formidable challenge. This paper traces the root of these difficulties to invalid i… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  23. arXiv:2404.00471  [pdf, other

    physics.med-ph cs.CV cs.LG eess.IV

    Score-Based Diffusion Models for Photoacoustic Tomography Image Reconstruction

    Authors: Sreemanti Dey, Snigdha Saha, Berthy T. Feng, Manxiu Cui, Laure Delisle, Oscar Leong, Lihong V. Wang, Katherine L. Bouman

    Abstract: Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 5 pages

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 2470-2474

  24. Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A

    Authors: The LHAASO Collaboration, Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen , et al. (256 additional authors not shown)

    Abstract: We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at… ▽ More

    Submitted 26 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures

    Journal ref: Physical Review Letters 132, 131002 (2024)

  25. arXiv:2403.09167  [pdf, other

    cs.CL

    Dial-insight: Fine-tuning Large Language Models with High-Quality Domain-Specific Data Preventing Capability Collapse

    Authors: Jianwei Sun, Chaoyang Mei, Linlin Wei, Kaiyu Zheng, Na Liu, Ming Cui, Tianyi Li

    Abstract: The efficacy of large language models (LLMs) is heavily dependent on the quality of the underlying data, particularly within specialized domains. A common challenge when fine-tuning LLMs for domain-specific applications is the potential degradation of the model's generalization capabilities. To address these issues, we propose a two-stage approach for the construction of production prompts designe… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  26. arXiv:2402.17292  [pdf, other

    cs.CV

    DivAvatar: Diverse 3D Avatar Generation with a Single Prompt

    Authors: Wei**g Tao, Biwen Lei, Kunhao Liu, Shijian Lu, Miaomiao Cui, Xuansong Xie, Chunyan Miao

    Abstract: Text-to-Avatar generation has recently made significant strides due to advancements in diffusion models. However, most existing work remains constrained by limited diversity, producing avatars with subtle differences in appearance for a given text prompt. We design DivAvatar, a novel framework that generates diverse avatars, empowering 3D creatives with a multitude of distinct and richly varied 3D… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  27. arXiv:2401.14051  [pdf, other

    cs.GR cs.CV

    A real-time rendering method for high albedo anisotropic materials with multiple scattering

    Authors: Shun Fang, Xing Feng, Ming Cui

    Abstract: We propose a neural network-based real-time volume rendering method for realistic and efficient rendering of volumetric media. The traditional volume rendering method uses path tracing to solve the radiation transfer equation, which requires a huge amount of calculation and cannot achieve real-time rendering. Therefore, this paper uses neural networks to simulate the iterative integration process… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  28. arXiv:2401.12456  [pdf, ps, other

    cs.CV cs.AI cs.GR

    Exploration and Improvement of Nerf-based 3D Scene Editing Techniques

    Authors: Shun Fang, Ming Cui, Xing Feng, Yanan Zhang

    Abstract: NeRF's high-quality scene synthesis capability was quickly accepted by scholars in the years after it was proposed, and significant progress has been made in 3D scene representation and synthesis. However, the high computational cost limits intuitive and efficient editing of scenes, making NeRF's development in the scene editing field facing many challenges. This paper reviews the preliminary expl… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  29. Methods and strategies for improving the novel view synthesis quality of neural radiation field

    Authors: Shun Fang, Ming Cui, Xing Feng, Yanna Lv

    Abstract: Neural Radiation Field (NeRF) technology can learn a 3D implicit model of a scene from 2D images and synthesize realistic novel view images. This technology has received widespread attention from the industry and has good application prospects. In response to the problem that the rendering quality of NeRF images needs to be improved, many researchers have proposed various methods to improve the re… ▽ More

    Submitted 17 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    ACM Class: I.2; I.4; I.6

    Journal ref: IEEE ACCESS 12 (2024) 50548-50555

  30. arXiv:2401.04152  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

    Authors: Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: End-to-end multi-talker speech recognition has garnered great interest as an effective approach to directly transcribe overlapped speech from multiple speakers. Current methods typically adopt either 1) single-input multiple-output (SIMO) models with a branched encoder, or 2) single-input single-output (SISO) models based on attention-based encoder-decoder architecture with serialized output train… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP2024

  31. arXiv:2401.02777  [pdf, other

    cs.CL cs.AI

    From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

    Authors: Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

    Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. I… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

  32. arXiv:2401.01173  [pdf, other

    cs.CV

    En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data

    Authors: Yifang Men, Biwen Lei, Yuan Yao, Miaomiao Cui, Zhouhui Lian, Xuansong Xie

    Abstract: We present En3D, an enhanced generative scheme for sculpting high-quality 3D human avatars. Unlike previous works that rely on scarce 3D datasets or limited 2D collections with imbalanced viewing angles and imprecise pose priors, our approach aims to develop a zero-shot 3D generative scheme capable of producing visually realistic, geometrically accurate and content-wise diverse 3D humans without r… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Project Page: https://menyifang.github.io/projects/En3D/index.html

  33. arXiv:2312.16837  [pdf, other

    cs.CV

    DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors

    Authors: Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

    Abstract: Text-guided domain adaptation and generation of 3D-aware portraits find many applications in various fields. However, due to the lack of training data and the challenges in handling the high variety of geometry and appearance, the existing methods for these tasks suffer from issues like inflexibility, instability, and low fidelity. In this paper, we propose a novel framework DiffusionGAN3D, which… ▽ More

    Submitted 12 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024

  34. arXiv:2312.05107  [pdf, other

    cs.CV

    DreaMoving: A Human Video Generation Framework based on Diffusion Models

    Authors: Mengyang Feng, **lin Liu, Kai Yu, Yuan Yao, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie

    Abstract: In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos. Specifically, given target identity and posture sequences, DreaMoving can generate a video of the target identity moving or dancing anywhere driven by the posture sequences. To this end, we propose a Video ControlNet for motion-controlling and a Content G… ▽ More

    Submitted 11 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 5 pages, 5 figures, Tech. Report

  35. arXiv:2311.13617  [pdf, other

    cs.CV

    Boosting3D: High-Fidelity Image-to-3D by Boosting 2D Diffusion Prior to 3D Prior with Progressive Learning

    Authors: Kai Yu, **lin Liu, Mengyang Feng, Miaomiao Cui, Xuansong Xie

    Abstract: We present Boosting3D, a multi-stage single image-to-3D generation method that can robustly generate reasonable 3D objects in different data domains. The point of this work is to solve the view consistency problem in single image-guided 3D generation by modeling a reasonable geometric structure. For this purpose, we propose to utilize better 3D prior to training the NeRF. More specifically, we tra… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 8 pages, 7 figures, 1 table

  36. arXiv:2311.13141  [pdf, other

    cs.CV

    Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

    Authors: Mengyang Feng, **lin Liu, Miaomiao Cui, Xuansong Xie

    Abstract: This is a technical report on the 360-degree panoramic image generation task based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic images capture the entire $360^\circ\times 180^\circ$ field of view. So the rightmost and the leftmost sides of the 360 panoramic image should be continued, which is the main challenge in this field. However, the current diffusion pipeline is not a… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 2 pages, 8 figures, Tech. Report

  37. arXiv:2311.11952  [pdf, other

    quant-ph cs.CV cs.ET

    Quantum Image Segmentation Based on Grayscale Morphology

    Authors: Wenjie Liu, Lu Wang, Mengmeng Cui

    Abstract: The classical image segmentation algorithm based on grayscale morphology can effectively segment images with uneven illumination, but with the increase of the image data, the real-time problem will emerge. In order to solve this problem, a quantum image segmentation algorithm is proposed in this paper, which can use quantum mechanism to simultaneously perform morphological operations on all pixels… ▽ More

    Submitted 2 October, 2023; originally announced November 2023.

    Comments: 20 pages, 12 figures

    Journal ref: IEEE Transactions on Quantum Engineering, 2022.3: p.3103012

  38. arXiv:2311.05267  [pdf, other

    cs.DC

    Analysis and Characterization of Performance Variability for OpenMP Runtime

    Authors: Minyu Cui, Nikela Papadopoulou, Miquel Pericàs

    Abstract: In the high performance computing (HPC) domain, performance variability is a major scalability issue for parallel computing applications with heavy synchronization and communication. In this paper, we present an experimental performance analysis of OpenMP benchmarks regarding the variation of execution time, and determine the potential factors causing performance variability. Our work offers some… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: To appear at ROSS 2023 (International Workshop on Runtime and Operating Systems for Supercomputers), held in conjunction with SC23

  39. arXiv:2310.18075  [pdf, other

    cs.CL cs.AI

    DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

    Authors: Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

    Abstract: Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow thinking respectively. The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the… ▽ More

    Submitted 24 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  40. arXiv:2310.17082  [pdf, ps, other

    astro-ph.HE

    Does or did the supernova remnant Cassiopeia A operate as a PeVatron?

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: For decades, supernova remnants (SNRs) have been considered the prime sources of Galactic Cosmic rays (CRs). But whether SNRs can accelerate CR protons to PeV energies and thus dominate CR flux up to the knee is currently under intensive theoretical and phenomenological debate. The direct test of the ability of SNRs to operate as CR PeVatrons can be provided by ultrahigh-energy (UHE;… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 11 pages, 3 figures, Accepted by the APJL

  41. arXiv:2310.14778  [pdf, other

    cs.MM cs.SD eess.AS

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Authors: **zheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

    Abstract: Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we condu… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  42. Very high energy gamma-ray emission beyond 10 TeV from GRB 221009A

    Authors: Zhen Cao, F. Aharonian, Q. An, A. Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: The highest energy gamma-rays from gamma-ray bursts (GRBs) have important implications for their radiation mechanism. Here we report for the first time the detection of gamma-rays up to 13 TeV from the brightest GRB 221009A by the Large High Altitude Air-shower Observatory (LHAASO). The LHAASO-KM2A detector registered more than 140 gamma-rays with energies above 3 TeV during 230$-$900s after the t… ▽ More

    Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: 49pages, 11figures

    Journal ref: Science Advances, 9, eadj2778 (2023) 15 November 2023

  43. arXiv:2309.05058  [pdf, other

    cs.SD cs.MM eess.AS

    Multimodal Fish Feeding Intensity Assessment in Aquaculture

    Authors: Meng Cui, Xubo Liu, Haohe Liu, Zhuangzhuang Du, Tao Chen, Guo** Lian, Daoliang Li, Wenwu Wang

    Abstract: Fish feeding intensity assessment (FFIA) aims to evaluate fish appetite changes during feeding, which is crucial in industrial aquaculture applications. Existing FFIA methods are limited by their robustness to noise, computational complexity, and the lack of public datasets for develo** the models. To address these issues, we first introduce AV-FFIA, a new dataset containing 27,000 labeled audio… ▽ More

    Submitted 19 May, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

  44. arXiv:2309.04608  [pdf, other

    cs.CV cs.MM

    Style Generation: Image Synthesis based on Coarsely Matched Texts

    Authors: Mengyao Cui, Zhe Zhu, Shao-** Lu, Yulu Yang

    Abstract: Previous text-to-image synthesis algorithms typically use explicit textual instructions to generate/manipulate images accurately, but they have difficulty adapting to guidance in the form of coarsely matched texts. In this work, we attempt to stylize an input image using such coarsely matched text as guidance. To tackle this new problem, we introduce a novel task called text-based style generation… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  45. arXiv:2308.04787  [pdf, other

    cs.SE

    rCanary: Detecting Memory Leaks Across Semi-automated Memory Management Boundary in Rust

    Authors: Mohan Cui, Suran Sun, Hui Xu, Yangfan Zhou

    Abstract: Rust is an effective system programming language that guarantees memory safety via compile-time verifications. It employs a novel ownership-based resource management model to facilitate automated resource deallocation. It is anticipated that this model will eliminate memory leaks. However, we observed that user intervention driving semi-automated management is prone to introducing leaks. In contra… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  46. arXiv:2308.04785  [pdf, other

    cs.SE

    Is unsafe an Achilles' Heel? A Comprehensive Study of Safety Requirements in Unsafe Rust Programming

    Authors: Mohan Cui, Suran Sun, Hui Xu, Yangfan Zhou

    Abstract: Rust is an emerging, strongly-typed programming language focusing on efficiency and memory safety. With increasing projects adopting Rust, knowing how to use Unsafe Rust is crucial for Rust security. We observed that the description of safety requirements needs to be unified in Unsafe Rust programming. Current unsafe API documents in the standard library exhibited variations, including inconsisten… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  47. arXiv:2307.16518  [pdf, other

    cs.IT eess.SP

    Continuous-Time Channel Prediction Based on Tensor Neural Ordinary Differential Equation

    Authors: Mingyao Cui, Hao Jiang, Yuhao Chen, Yang Du, Linglong Dai

    Abstract: Channel prediction is critical to address the channel aging issue in mobile scenarios. Existing channel prediction techniques are mainly designed for discrete channel prediction, which can only predict the future channel in a fixed time slot per frame, while the other intra-frame channels are usually recovered by interpolation. However, these approaches suffer from a serious interpolation loss, es… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: A tensor neural ODE based method is proposed to predict continuous-time wireless channels

  48. arXiv:2307.14335  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    WavJourney: Compositional Audio Creation with Large Language Models

    Authors: Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, **hua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

    Abstract: Despite breakthroughs in audio generation models, their capabilities are often confined to domain-specific conditions such as speech transcriptions and audio captions. However, real-world audio creation aims to generate harmonious audio containing various elements such as speech, music, and sound effects with controllable conditions, which is challenging to address using existing audio generation… ▽ More

    Submitted 26 November, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: GitHub: https://github.com/Audio-AGI/WavJourney

  49. arXiv:2307.12629  [pdf, other

    hep-ex astro-ph.HE physics.ins-det

    BGO quenching effect on spectral measurements of cosmic-ray nuclei in DAMPE experiment

    Authors: Zhan-Fang Chen, Chuan Yue, Wei Jiang, Ming-Yang Cui, Qiang Yuan, Ying Wang, Cong Zhao, Yi-Feng Wei

    Abstract: The Dark Matter Particle Explorer (DAMPE) is a satellite-borne detector designed to measure high energy cosmic-rays and $γ$-rays. As a key sub-detector of DAMPE, the Bismuth Germanium Oxide (BGO) imaging calorimeter is utilized to measure the particle energy with a high resolution. The nonlinear fluorescence response of BGO for large ionization energy deposition, known as the quenching effect, res… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 13 pages, 4 figures, to be published in Nuclear Inst. and Methods in Physics Research, A

  50. arXiv:2307.02909  [pdf, other

    eess.AS cs.AI cs.SD

    Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

    Authors: Guinan Li, Jiajun Deng, Mengzhe Geng, Zengrui **, Tianzi Wang, Shujie Hu, Mingyu Cui, Helen Meng, Xunying Liu

    Abstract: Accurate recognition of cocktail party speech containing overlap** speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-channel speech separation, dereverberation and recognition approach featuring a full incorporation of visual information into all system components is pro… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing