Skip to main content

Showing 51–100 of 1,403 results for author: Sun, M

.
  1. arXiv:2404.06395  [pdf, other

    cs.CL cs.LG

    MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

    Authors: Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The burgeoning interest in develo** Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: revise according to peer review

  2. arXiv:2404.02885  [pdf, other

    cs.CV

    PoCo: Point Context Cluster for RGBD Indoor Place Recognition

    Authors: **g Liang, Zhuo Deng, Zheming Zhou, Omid Ghasemalizadeh, Dinesh Manocha, Min Sun, Cheng-Hao Kuo, Arnie Sen

    Abstract: We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database. The task presents inherent challenges attributed to the constrained field of view and limited range of perception sensors. We propose a new network architecture, which generalizes the recent Context of Clusters (… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  3. arXiv:2404.02078  [pdf, other

    cs.AI cs.CL cs.LG

    Advancing LLM Reasoning Generalists with Preference Trees

    Authors: Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, Zhenghao Liu, Bowen Zhou, Hao Peng, Zhiyuan Liu, Maosong Sun

    Abstract: We introduce Eurus, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B and CodeLlama-70B, Eurus models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, Eurus-70B beats GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 1… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Models and data are available at https://github.com/OpenBMB/Eurus

  4. arXiv:2404.01082  [pdf, other

    eess.IV

    The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

    Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Li** Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

    Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More

    Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 25 pages, 17 figures

  5. arXiv:2404.00095  [pdf, other

    cs.CV

    GDA: Generalized Diffusion for Robust Test-time Adaptation

    Authors: Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo

    Abstract: Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

  6. arXiv:2403.20079  [pdf, other

    cs.CV

    SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

    Authors: Zhongrui Yu, Haoran Wang, **ze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun

    Abstract: Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviate… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  7. arXiv:2403.19467  [pdf, other

    cs.CV

    Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication

    Authors: Mingze Sun, Chao Xu, Xinyu Jiang, Yang Liu, Baigui Sun, Ruqi Huang

    Abstract: In this paper, we introduce an innovative task focused on human communication, aiming to generate 3D holistic human motions for both speakers and listeners. Central to our approach is the incorporation of factorization to decouple audio features and the combination of textual semantic information, thereby facilitating the creation of more realistic and coordinated movements. We separately train VQ… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  8. Low-Complexity Estimation Algorithm and Decoupling Scheme for FRaC System

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao, Fei Shen

    Abstract: With the lea** advances in autonomous vehicles and transportation infrastructure, dual function radar-communication (DFRC) systems have become attractive due to the size, cost and resource efficiency. A frequency modulated continuous waveform (FMCW)-based radar-communication system (FRaC) utilizing both sparse multiple-input and multiple-output (MIMO) arrays and index modulation (IM) has been pr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Journal ref: {IEEE Transactions on Intelligent Vehicles, 2024

  9. arXiv:2403.17733  [pdf, other

    cs.CL

    Continual Few-shot Event Detection via Hierarchical Augmentation Networks

    Authors: Chenlong Zhang, Pengfei Cao, Yubo Chen, Kang Liu, Zhiqiang Zhang, Mengshu Sun, Jun Zhao

    Abstract: Traditional continual event detection relies on abundant labeled data for training, which is often impractical to obtain in real-world applications. In this paper, we introduce continual few-shot event detection (CFED), a more commonly encountered scenario when a substantial number of labeled samples are not accessible. The CFED task is challenging as it involves memorizing previous event types an… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  10. arXiv:2403.17447  [pdf, other

    cs.LG cs.CV cs.NE

    Chain of Compression: A Systematic Approach to Combinationally Compress Convolutional Neural Networks

    Authors: Yingtao Shen, Minqing Sun, Jie Zhao, An Zou

    Abstract: Convolutional neural networks (CNNs) have achieved significant popularity, but their computational and memory intensity poses challenges for resource-constrained computing systems, particularly with the prerequisite of real-time performance. To release this burden, model compression has become an important research focus. Many approaches like quantization, pruning, early exit, and knowledge distil… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 10 pages, 15 figures

  11. arXiv:2403.17431  [pdf, other

    cs.CL cs.LG

    Robust and Scalable Model Editing for Large Language Models

    Authors: Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun

    Abstract: Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context. In many scenarios, a desirable behavior is that LLMs give precedence to contextual knowledge when it conflicts with the parametric knowledge, and fall back to using their parametric knowledge when the context is irrelevan… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024 paper, 16 pages, 4 figures

  12. arXiv:2403.17279  [pdf, other

    astro-ph.SR astro-ph.HE

    Stellar Spin Down in Post-Mass Transfer Binary Systems

    Authors: Meng Sun, Seth Gossage, Emily M. Leiner, Aaron M. Geller

    Abstract: Motivated by measurements of the rotation speed of accretor stars in post-mass-transfer (post-MT) systems, we investigate how magnetic braking affects the spin-down of individual stars during binary evolution with the MESAbinary module. Unlike the conventional assumption of tidal synchronization coupled with magnetic braking in binaries, we first calculate whether tides are strong enough to synchr… ▽ More

    Submitted 21 May, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures, 3 tables, accepted to ApJ

  13. arXiv:2403.16473  [pdf, other

    cs.CR eess.IV

    Plaintext-Free Deep Learning for Privacy-Preserving Medical Image Analysis via Frequency Information Embedding

    Authors: Mengyu Sun, Ziyuan Yang, Maosong Ran, Zhiwen Wang, Hui Yu, Yi Zhang

    Abstract: In the fast-evolving field of medical image analysis, Deep Learning (DL)-based methods have achieved tremendous success. However, these methods require plaintext data for training and inference stages, raising privacy concerns, especially in the sensitive area of medical data. To tackle these concerns, this paper proposes a novel framework that uses surrogate images for analysis, eliminating the n… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  14. arXiv:2403.14978  [pdf, other

    cs.IT eess.SP

    Range-Angle Estimation for FDA-MIMO System With Frequency Offset

    Authors: Mengjiang Sun, Peng Chen, Zhenxin Cao

    Abstract: Frequency diverse array multiple-input multiple-output (FDA-MIMO) radar differs from the traditional phased array (PA) radar, and can form range-angle-dependent beampattern and differentiate between closely spaced targets sharing the same angle but occupying distinct range cells. In the FDA-MIMO radar, target range estimation is achieved by employing a subtle frequency variation between adjacent a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Journal ref: IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2024

  15. arXiv:2403.14185  [pdf, other

    eess.SP

    A LiDAR-Aided Channel Model for Vehicular Intelligent Sensing-Communication Integration

    Authors: Ziwei Huang, Lu Bai, Mingran Sun, Xiang Cheng

    Abstract: In this paper, a novel channel modeling approach, named light detection and ranging (LiDAR)-aided geometry-based stochastic modeling (LA-GBSM), is developed. Based on the developed LA-GBSM approach, a new millimeter wave (mmWave) channel model for sixth-generation (6G) vehicular intelligent sensing-communication integration is proposed, which can support the design of intelligent transportation sy… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  16. arXiv:2403.14023  [pdf

    cs.CR

    A system capable of verifiably and privately screening global DNA synthesis

    Authors: Carsten Baum, Jens Berlips, Walther Chen, Hongrui Cui, Ivan Damgard, Jiangbin Dong, Kevin M. Esvelt, Mingyu Gao, Dana Gretton, Leonard Foner, Martin Kysel, Kaiyi Zhang, Juanru Li, Xiang Li, Omer Paneth, Ronald L. Rivest, Francesca Sage-Ling, Adi Shamir, Yue Shen, Meicen Sun, Vinod Vaikuntanathan, Lynn Van Hauwe, Theia Vogel, Benjamin Weinstein-Raun, Yun Wang , et al. (5 additional authors not shown)

    Abstract: Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't n… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Main text 10 pages, 4 figures. 5 supplementary figures. Total 21 pages. Direct correspondence to: Ivan B. Damgard ([email protected]), Andrew C. Yao ([email protected]), Kevin M. Esvelt ([email protected])

  17. arXiv:2403.13600  [pdf, other

    cs.CV

    VL-Mamba: Exploring State Space Models for Multimodal Learning

    Authors: Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, **g Liu

    Abstract: Multimodal large language models (MLLMs) have attracted widespread interest and have rich applications. However, the inherent attention mechanism in its Transformer structure requires quadratic complexity and results in expensive computational overhead. Therefore, in this work, we propose VL-Mamba, a multimodal large language model based on state space models, which have been shown to have great p… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  18. arXiv:2403.12385  [pdf, other

    cs.CV

    VideoBadminton: A Video Dataset for Badminton Action Recognition

    Authors: Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang, Min-Te Sun, Wei-Shinn Ku

    Abstract: In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  19. arXiv:2403.12063  [pdf, other

    cs.CV cs.LG

    Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers

    Authors: Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin, Yan Wang, **g**g Liu, Ya-Qin Zhang

    Abstract: Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_θ(X_0|y)$, with a predefined diffusion model $p_θ(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$. Existing DIS estimate the conditional score function by evaluating $f(\cdot)$ with an approximated posterior sample drawn from $p_θ(X_0|X_t)$. However, most pr… ▽ More

    Submitted 1 June, 2024; v1 submitted 8 February, 2024; originally announced March 2024.

  20. arXiv:2403.11703  [pdf, other

    cs.CV cs.AI

    LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

    Authors: Ruyi Xu, Yuan Yao, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang

    Abstract: Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding the visual world. Conventional LMMs process images in fixed sizes and limited resolutions, while recent explorations in this direction are limited in adaptivity, efficiency, and even correctness. In this work, we first take GPT-4V and LLaVA-1.5 as representative examples and expose systematic flaws rooted in t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Preprint

  21. arXiv:2403.11451  [pdf, other

    cs.CV

    CasSR: Activating Image Power for Real-World Image Super-Resolution

    Authors: Haolan Chen, **hua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Wei Hu

    Abstract: The objective of image super-resolution is to generate clean and high-resolution images from degraded versions. Recent advancements in diffusion modeling have led to the emergence of various image super-resolution techniques that leverage pretrained text-to-image (T2I) models. Nevertheless, due to the prevalent severe degradation in low-resolution images and the inherent characteristics of diffusi… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  22. arXiv:2403.11287  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci

    Neural-network density functional theory

    Authors: Yang Li, Zechen Tang, Zezhou Chen, Minghui Sun, Boheng Zhao, He Li, Honggeng Tao, Zilong Yuan, Wenhui Duan, Yong Xu

    Abstract: Deep-learning density functional theory (DFT) shows great promise to significantly accelerate material discovery and potentially revolutionize materials research, which demands a close combination between neural networks and DFT computation. However, current research in this field primarily relies on supervised learning, making the developments of neural networks and DFT isolated from each other.… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  23. arXiv:2403.10362  [pdf, other

    eess.IV cs.CV

    CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

    Authors: Qiang Zhu, **hua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu

    Abstract: Recently, numerous approaches have achieved notable success in compressed video quality enhancement (VQE). However, these methods usually ignore the utilization of valuable coding priors inherently embedded in compressed videos, such as motion vectors and residual frames, which carry abundant temporal and spatial information. To remedy this problem, we propose the Coding Priors-Guided Aggregation… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  24. arXiv:2403.09347  [pdf, other

    cs.DC cs.LG

    BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

    Authors: Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun

    Abstract: Effective attention modules have played a crucial role in the success of Transformer-based large language models (LLMs), but the quadratic time and memory complexities of these attention modules also pose a challenge when processing long sequences. One potential solution for the long sequence problem is to utilize distributed clusters to parallelize the computation of attention modules across mult… ▽ More

    Submitted 6 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures

  25. arXiv:2403.09274  [pdf, other

    cs.CV

    EventRPG: Event Data Augmentation with Relevance Propagation Guidance

    Authors: Mingyuan Sun, Donghao Zhang, Zongyuan Ge, Jiaxu Wang, Jia Li, Zheng Fang, Ren**g Xu

    Abstract: Event camera, a novel bio-inspired vision sensor, has drawn a lot of attention for its low latency, low power consumption, and high dynamic range. Currently, overfitting remains a critical problem in event-based classification tasks for Spiking Neural Network (SNN) due to its relatively weak spatial representation capability. Data augmentation is a simple but efficient method to alleviate overfitt… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by ICLR 2024

  26. arXiv:2403.09253  [pdf

    physics.optics physics.chem-ph

    Broadband NIR photon upconversion generates NIR persistent luminescence for bioimaging

    Authors: Shuting Yang, Bing Qi, Mingzi Sun, Wen**g Dai, Ziyun Miao, Wei Zheng, Bolong Huang, Jie Wang

    Abstract: Upconversion persistent luminescence (UCPL) phosphors that can be directly charged by near-infrared (NIR) light have gained considerable attention due to their promising applications ranging from photonics to biomedicine. However, current lanthanide-based UCPL phosphors show small absorption cross-sections and low upconversion charging efficiency. The development of UCPL phosphors faces challenges… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  27. arXiv:2403.08281  [pdf, other

    cs.CL cs.AI

    Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

    Authors: Ning Ding, Yulin Chen, Ganqu Cui, Xingtai Lv, Weilin Zhao, Ruobing Xie, Bowen Zhou, Zhiyuan Liu, Maosong Sun

    Abstract: Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typ… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  28. arXiv:2403.08147  [pdf, other

    cs.LG q-bio.BM

    Representing Molecules as Random Walks Over Interpretable Grammars

    Authors: Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik

    Abstract: Recent research in molecular discovery has primarily been devoted to small, drug-like molecules, leaving many similarly important applications in material design without adequate technology. These applications often rely on more complex molecular structures with fewer examples that are carefully designed using known substructures. We propose a data-efficient and interpretable model for representin… ▽ More

    Submitted 2 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  29. arXiv:2403.08014  [pdf, other

    astro-ph.EP astro-ph.SR

    The intermittently-resonant coevolution of migrating planets and their pulsating stars

    Authors: Jared Bryan, Julien de Wit, Meng Sun, Zoe L. de Beurs, Richard H. D. Townsend

    Abstract: Hot Jupiters are expected to form far from their host star and move toward close-in, circular orbits via a smooth, monotonic decay due to mild and constant tidal dissipation. Yet, three systems have recently been found exhibiting planet-induced stellar pulsations suggesting unexpectedly strong tidal interactions. Here we combine stellar evolution and tide models to show that dynamical tides raised… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Revised after positive initial review report

  30. arXiv:2403.07839  [pdf, other

    cs.CV cs.AI cs.MM

    MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

    Authors: Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun

    Abstract: Vision-language pre-trained models have achieved impressive performance on various downstream tasks. However, their large model sizes hinder their utilization on platforms with limited computational resources. We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Recent efforts for VLP compression… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 18 pages, 8 figures, Published in CVPR2024

    Journal ref: In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  31. arXiv:2403.07714  [pdf, other

    cs.CL

    StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

    Authors: Zhicheng Guo, Sijie Cheng, Hao Wang, Shihao Liang, Yujia Qin, Peng Li, Zhiyuan Liu, Maosong Sun, Yang Liu

    Abstract: Large Language Models (LLMs) have witnessed remarkable advancements in recent years, prompting the exploration of tool learning, which integrates LLMs with external tools to address diverse real-world challenges. Assessing the capability of LLMs to utilise tools necessitates large-scale and stable benchmarks. However, previous works relied on either hand-crafted online tools with limited scale, or… ▽ More

    Submitted 19 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  32. arXiv:2403.07172  [pdf, other

    astro-ph.HE astro-ph.SR

    To Be or not to Be: the role of rotation in modeling Galactic Be X-ray Binaries

    Authors: Kyle Akira Rocha, Vicky Kalogera, Zoheyr Doctor, Jeff J. Andrews, Meng Sun, Seth Gossage, Simone S. Bavera, Tassos Fragos, Konstantinos Kovlakas, Matthias U. Kruckow, Devina Misra, Philipp M. Srivastava, Zepei Xing, Emmanouil Zapartas

    Abstract: Be X-ray binaries (Be-XRBs) are crucial in understanding high-mass X-ray binaries, featuring a rapidly rotating Be star and a neutron star companion in an eccentric orbit, intermittently accreting material from the Be star's decretion disk. Originating from binary stellar evolution, Be-XRBs are of significant interest to binary population synthesis (BPS) studies, encapsulating the physics of super… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 20 pages, 10 figures, Submitted to ApJ

  33. arXiv:2403.06504  [pdf, other

    cs.DC

    Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

    Authors: Changyue Liao, Mo Sun, Zihan Yang, Kaiqi Chen, Binhang Yuan, Fei Wu, Zeke Wang

    Abstract: Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient des… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  34. FARPLS: A Feature-Augmented Robot Trajectory Preference Labeling System to Assist Human Labelers' Preference Elicitation

    Authors: Hanfang Lyu, Yuanchen Bai, Xin Liang, Ujaan Das, Chuhan Shi, Leiliang Gong, Yingchi Li, Mingfei Sun, Ming Ge, Xiaojuan Ma

    Abstract: Preference-based learning aims to align robot task objectives with human values. One of the most common methods to infer human preferences is by pairwise comparisons of robot task trajectories. Traditional comparison-based preference labeling systems seldom support labelers to digest and identify critical differences between complex trajectories recorded in videos. Our formative study (N = 12) sug… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted to ACM Conference on Intelligent User Interfaces (IUI) 2024, March 18-21, 2024, Greenville, SC, USA

  35. arXiv:2403.06068  [pdf, other

    math.ST

    Hypothesis testing for homogenous of nodes in $β$-models

    Authors: Kang Fu, Jianwei Hu, Meng Sun

    Abstract: The $β$-model has been extensively utilized to model degree heterogeneity in networks, wherein each node is assigned a unique parameter. In this article, we consider the hypothesis testing problem that two nodes $i$ and $j$ of a $β$-model have the same node parameter. We prove that the null distribution of the proposed statistic converges in distribution to the standard normal distribution. Furthe… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  36. arXiv:2403.05427  [pdf, other

    cs.MM

    Reply with Sticker: New Dataset and Model for Sticker Retrieval

    Authors: Bin Liang, Bingbing Wang, Zhixin Bai, Qiwei Lang, Mingwei Sun, Kaiheng Hou, Kam-Fai Wong, Ruifeng Xu

    Abstract: Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the curren… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  37. arXiv:2403.05132  [pdf, other

    cs.CL cs.AI

    ChatUIE: Exploring Chat-based Unified Information Extraction using Large Language Models

    Authors: Jun Xu, Mengshu Sun, Zhiqiang Zhang, Jun Zhou

    Abstract: Recent advancements in large language models have shown impressive performance in general chat. However, their domain-specific capabilities, particularly in information extraction, have certain limitations. Extracting structured information from natural language that deviates from known schemas or instructions has proven challenging for previous prompt-based methods. This motivated us to explore d… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  38. arXiv:2403.05049  [pdf, other

    cs.CV

    XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution

    Authors: Yunpeng Qu, Kun Yuan, Kai Zhao, Qizhi Xie, **hua Hao, Ming Sun, Chao Zhou

    Abstract: Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution (ISR) recently. However, as low-resolution (LR) images often undergo severe degradation, it is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. To address th… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 19 pages, 7 figures

  39. arXiv:2403.04306  [pdf, other

    cs.CV cs.AI cs.LG

    Effectiveness Assessment of Recent Large Vision-Language Models

    Authors: Yao Jiang, Xinyu Yan, Ge-Peng Ji, Keren Fu, Meijun Sun, Huan Xiong, Deng-** Fan, Fahad Shahbaz Khan

    Abstract: The advent of large vision-language models (LVLMs) represents a remarkable advance in the quest for artificial general intelligence. However, the model's effectiveness in both specialized and general tasks warrants further investigation. This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks, respectively, aiming to offer a comprehensive understanding of… ▽ More

    Submitted 11 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by Visual Intelligence

  40. arXiv:2403.04204  [pdf, other

    cs.AI cs.CL

    On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

    Authors: Xinpeng Wang, Shitong Duan, Xiaoyuan Yi, **g Yao, Shanlin Zhou, Zhihua Wei, Peng Zhang, Dongkuan Xu, Maosong Sun, Xing Xie

    Abstract: Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns. Addressing such concerns, alignment technologies were introduced to make these models conform to human preferences and values. Despite considerable advancements in the past year, various challenges lie in establishing the optimal alignment strategy, such as data cost and scalable o… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 23 pages, 7 figures

  41. arXiv:2403.03920  [pdf, other

    cs.AI cs.CL cs.HC

    Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

    Authors: Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, **g Liu

    Abstract: This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts. We integrate Richard Elmore's Instructional Core Framework to examine how artificial intelligence (AI) and machine learning (ML) methods, particularly natural language processing (NLP), can analyze educational content, teach… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  42. arXiv:2403.02893  [pdf, other

    cs.CL cs.AI

    Zero-Shot Cross-Lingual Document-Level Event Causality Identification with Heterogeneous Graph Contrastive Transfer Learning

    Authors: Zhitao He, Pengfei Cao, Zhuoran **, Yubo Chen, Kang Liu, Zhiqiang Zhang, Mengshu Sun, Jun Zhao

    Abstract: Event Causality Identification (ECI) refers to the detection of causal relations between events in texts. However, most existing studies focus on sentence-level ECI with high-resource languages, leaving more challenging document-level ECI (DECI) with low-resource languages under-explored. In this paper, we propose a Heterogeneous Graph Interaction Model with Multi-granularity Contrastive Transfer… ▽ More

    Submitted 22 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  43. arXiv:2403.01977  [pdf, other

    cs.RO cs.AI cs.CV

    TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

    Authors: Maytus Piriyajitakonkij, Mingfei Sun, Mengmi Zhang, Wei Pan

    Abstract: Robot navigation under visual corruption presents a formidable challenge. To address this, we propose a Test-time Adaptation (TTA) method, named as TTA-Nav, for point-goal navigation under visual corruptions. Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model. Firstly, the pre-trained navigation model gets a corrupted image and extracts features. Secondly,… ▽ More

    Submitted 14 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS2024

  44. arXiv:2403.01901  [pdf, other

    cs.CV

    FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

    Authors: Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun

    Abstract: In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generation from a single audio. Specifically, it involves two critical challenges: one is to effectively decouple identity, content, and emotion from entangl… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  45. arXiv:2403.01810  [pdf, other

    cond-mat.soft

    Global self$-$similarity of dense granular flow in hopper: the role of hopper width

    Authors: Changhao Li, Xin Li, Xianggui Chen, Zaixin Wang, Min Sun, Decai Huang

    Abstract: The influence of hopper width on dense granular flow in a two$-$dimensional hopper is investigated through experiments and simulations. Though the flow rate remains stable for larger hopper widths, a slight reduction in hopper width results in a significant increase in flow rate for smaller hopper widths. Both Beverloo\('\)s and Janda\('\)s formula accurately capture the relationship between the f… ▽ More

    Submitted 20 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 17 papges, 9 figures

  46. arXiv:2403.01691  [pdf, other

    astro-ph.HE

    How long will the quasar UV/optical flickering be damped?

    Authors: Shuying Zhou, Mouyuan Sun, Zhen-Yi Cai, Guowei Ren, Jun-Xian Wang, Yongquan Xue

    Abstract: The UV/optical light curves of Active Galactic Nuclei (AGNs) are commonly described by the Damped Random Walk (DRW) model. However, the physical interpretation of the dam** timescale, a key parameter in the DRW model, remains unclear. Particularly, recent observations indicate a weak dependence of the dam** timescale upon both wavelength and accretion rate, clearly being inconsistent with the… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 19 pages, 16 figures, accepted to ApJ

  47. arXiv:2403.01542  [pdf, other

    cs.RO cs.HC

    Human Robot Pacing Mismatch

    Authors: Muchen Sun, Peter Trautman, Todd Murphey

    Abstract: A widely accepted explanation for robots planning overcautious or overaggressive trajectories alongside human is that the crowd density exceeds a threshold such that all feasible trajectories are considered unsafe -- the freezing robot problem. However, even with low crowd density, the robot's navigation performance could still drop drastically when in close proximity to human. In this work, we ar… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted to 2022 Robotics: Science and Systems (RSS) Workshop in Close Proximity Human-Robot Collaboration

  48. arXiv:2403.01537  [pdf, other

    cs.RO cs.GT cs.LG

    Mixed Strategy Nash Equilibrium for Crowd Navigation

    Authors: Muchen Sun, Francesca Baldini, Katie Hughes, Peter Trautman, Todd Murphey

    Abstract: Robots navigating in crowded areas should negotiate free space with humans rather than fully controlling collision avoidance, as this can lead to freezing behavior. Game theory provides a framework for the robot to reason about potential cooperation from humans for collision avoidance during path planning. In particular, the mixed strategy Nash equilibrium captures the negotiation behavior under u… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  49. arXiv:2403.01536  [pdf, other

    cs.RO cs.LG

    Fast Ergodic Search with Kernel Functions

    Authors: Muchen Sun, Ayush Gaggar, Peter Trautman, Todd Murphey

    Abstract: Ergodic search enables optimal exploration of an information distribution while guaranteeing the asymptotic coverage of the search space. However, current methods typically have exponential computation complexity in the search space dimension and are restricted to Euclidean space. We introduce a computationally efficient ergodic search method. Our contributions are two-fold. First, we develop a ke… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  50. arXiv:2403.01509  [pdf, other

    cs.CL

    Fantastic Semantics and Where to Find Them: Investigating Which Layers of Generative LLMs Reflect Lexical Semantics

    Authors: Zhu Liu, Cunliang Kong, Ying Liu, Maosong Sun

    Abstract: Large language models have achieved remarkable success in general language understanding tasks. However, as a family of generative methods with the objective of next token prediction, the semantic evolution with the depth of these models are not fully explored, unlike their predecessors, such as BERT-like architectures. In this paper, we specifically investigate the bottom-up evolution of lexical… ▽ More

    Submitted 9 June, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted to Findings of ACL 2024