Skip to main content

Showing 1–50 of 800 results for author: Sun, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19949  [pdf, other

    cs.CL

    Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring

    Authors: Jiazheng Li, Hainiu Xu, Zhaoyue Sun, Yuxiang Zhou, David West, Cesare Aloisi, Yulan He

    Abstract: Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.19756  [pdf, other

    cs.CV cs.AI

    Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

    Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Technical report

  3. arXiv:2406.18995  [pdf, other

    cs.LG cs.AI

    FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity

    Authors: Zhaobin Sun, Nannan Wu, Junjie Shi, Li Yu, Xin Yang, Kwang-Ting Cheng, Zengqiang Yan

    Abstract: Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI 2024

  4. Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

    Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

    Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (IJCV), 2024

  5. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong **, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, **feng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  6. arXiv:2406.17328  [pdf, other

    cs.CL cs.AI

    Dual-Space Knowledge Distillation for Large Language Models

    Authors: Songming Zhang, Xue Zhang, Zengkui Sun, Yufeng Chen, **an Xu

    Abstract: Knowledge distillation (KD) is known as a promising solution to compress large language models (LLMs) via transferring their knowledge to smaller models. During this process, white-box KD methods usually minimize the distance between the output distributions of the two models so that more knowledge can be transferred. However, in the current white-box KD framework, the output distributions are fro… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages, 11 figures, code available at: https://github.com/songmzhang/DSKD

  7. arXiv:2406.13236  [pdf, other

    cs.CL cs.AI

    Data Contamination Can Cross Language Barriers

    Authors: Feng Yao, Yufan Zhuang, Zihao Sun, Sunan Xu, Animesh Kumar, **gbo Shang

    Abstract: The opacity in develo** large language models (LLMs) is raising growing concerns about the potential contamination of public benchmarks in the pre-training data. Existing contamination detection methods are typically based on the text overlap between training and evaluation data, which can be too superficial to reflect deeper forms of contamination. In this paper, we first present a cross-lingua… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  8. arXiv:2406.13165  [pdf, other

    eess.IV cs.AI cs.CV cs.RO

    Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

    Authors: Haojun Jiang, Zhenguo Sun, Ning Jia, Meng Li, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Early Accepted by MICCAI 2024

  9. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  10. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11175  [pdf, other

    cs.SD eess.AS

    SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

    Authors: Zhihang Sun, Andong Li, Rilin Chen, Hao Zhang, Meng Yu, Yi Zhou, Dong Yu

    Abstract: The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the deployment generality in different processing scenarios, such as edge devices, and cloud processing. To this end, this paper proposes a general model, t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  12. arXiv:2406.10710  [pdf, other

    cs.AI cs.CL

    SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task

    Authors: Ziije Zhong, Linqing Zhong, Zhaoze Sun, Qingyun **, Zengchang Qin, Xiaofan Zhang

    Abstract: Integrating Large Language Models (LLMs) with existing Knowledge Graph (KG) databases presents a promising avenue for enhancing LLMs' efficacy and mitigating their "hallucinations". Given that most KGs reside in graph databases accessible solely through specialized query languages (e.g., Cypher), there exists a critical need to bridge the divide between LLMs and KG databases by automating the tran… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 19 pages, 15 figures, 8 tables

  13. arXiv:2406.10158  [pdf, other

    cs.DB cs.DC

    Harnessing GPU Power for Enhanced OLTP: A Study in Concurrency Control Schemes

    Authors: Zihan Sun, Yong Zhang, Chao Li, Chunxiao Xing

    Abstract: GPUs, whose performance has gone through a huge leap over the past decade, have proved their ability to accelerate Online Analytical Processing (OLAP) operations. On the other hand, there is still a huge gap in the field of GPU-accelerated Online Transaction Processing (OLTP) operations since it was generally believed that GPUswere not suitable for OLTP in the past. However, the massive parallelis… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  14. arXiv:2406.09003  [pdf, other

    cs.CV cs.LG

    Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

    Authors: Lincan Cai, Shuang Li, Wenxuan Ma, **gxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu

    Abstract: Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.08634  [pdf, other

    eess.IV cs.CV cs.LG

    Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning

    Authors: Zhongao Sun, Jiameng Li, Yuhan Wang, Jiarong Cheng, Qing Zhou, Chun Li

    Abstract: Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI) where missing modality images are common in clinical settings, leading to reduced segmentation accuracy. To address this issue, we propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  16. arXiv:2406.08187  [pdf, other

    cs.RO

    Learning-based Traversability Costmap for Autonomous Off-road Navigation

    Authors: Qiumin Zhu, Zhen Sun, Songpengcheng Xia, Guoqing Liu, Kehui Ma, Ling Pei, Zheng Gong

    Abstract: Traversability estimation in off-road terrains is an essential procedure for autonomous navigation. However, creating reliable labels for complex interactions between the robot and the surface is still a challenging problem in learning-based costmap generation. To address this, we propose a method that predicts traversability costmaps by leveraging both visual and geometric information of the envi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.07951  [pdf, other

    cs.CV

    DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

    Authors: Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha

    Abstract: Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as smartphones and wearable devices. Despite its potential, the lack of Image signal processing (ISP) pipeline specifically designed for HybridEVS poses… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  18. arXiv:2406.05462  [pdf, other

    cs.DB

    MatrixGate: A High-performance Data Ingestion Tool for Time-series Databases

    Authors: Shuhui Wang, Zihan Sun, Chaochen Hu, Chao Li, Yong Zhang, Yandong Yao, Hao Wang, Chunxiao Xing

    Abstract: Recent years have seen massive time-series data generated in many areas. This different scenario brings new challenges, particularly in terms of data ingestion, where existing technologies struggle to handle such massive time-series data, leading to low loading speed and poor timeliness. To address these challenges, this paper presents MatrixGate, a new and efficient data ingestion approach for ma… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  19. arXiv:2406.05404  [pdf, other

    cs.CV cs.GR

    Layered Image Vectorization via Semantic Simplification

    Authors: Zhenyu Wang, Jianxi Huang, Zhida Sun, Daniel Cohen-Or, Min Lu

    Abstract: This work presents a novel progressive image vectorization technique aimed at generating layered vectors that represent the original image from coarse to fine detail levels. Our approach introduces semantic simplification, which combines Score Distillation Sampling and semantic segmentation to iteratively simplify the input image. Subsequently, our method optimizes the vector layers for each of th… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  20. arXiv:2406.04649  [pdf, other

    cs.CV

    SMART: Scene-motion-aware human action recognition framework for mental disorder group

    Authors: Zengyuan Lai, Jiarui Yang, Songpengcheng Xia, Qi Wu, Zhen Sun, Wenxian Yu, Ling Pei

    Abstract: Patients with mental disorders often exhibit risky abnormal actions, such as climbing walls or hitting windows, necessitating intelligent video behavior monitoring for smart healthcare with the rising Internet of Things (IoT) technology. However, the development of vision-based Human Action Recognition (HAR) for these actions is hindered by the lack of specialized algorithms and datasets. In this… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  21. arXiv:2406.03762  [pdf, other

    cs.DC q-bio.NC

    CORTEX: Large-Scale Brain Simulator Utilizing Indegree Sub-Graph Decomposition on Fugaku Supercomputer

    Authors: Tianxiang Lyu, Mitsuhisa Sato, Shigeki Aoki, Ryutaro Himeno, Zhe Sun

    Abstract: We introduce CORTEX, an algorithmic framework designed for large-scale brain simulation. Leveraging the computational capacity of the Fugaku Supercomputer, CORTEX maximizes available problem size and processing performance. Our primary innovation, Indegree Sub-Graph Decomposition, along with a suite of parallel algorithms, facilitates efficient domain decomposition by segmenting the global graph s… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  22. arXiv:2406.03141  [pdf, other

    q-bio.BM cs.LG

    Floating Anchor Diffusion Model for Multi-motif Scaffolding

    Authors: Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

    Abstract: Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  23. arXiv:2406.02882  [pdf, other

    cs.CL cs.AI

    Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge

    Authors: Zengkui Sun, Yi** Liu, Jiaan Wang, Fandong Meng, **an Xu, Yufeng Chen, Jie Zhou

    Abstract: Recently, Knowledge Editing has received increasing attention, since it could update the specific knowledge from outdated ones in pretrained models without re-training. However, as pointed out by recent studies, existing related methods tend to merely memorize the superficial word composition of the edited knowledge, rather than truly learning and absorbing it. Consequently, on the reasoning quest… ▽ More

    Submitted 16 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024 Findings, Codes are at https://github.com/Acerkoo/DISCO

  24. arXiv:2406.02876  [pdf, other

    cs.CL cs.AI

    LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation

    Authors: Zengkui Sun, Yi** Liu, Fandong Meng, **an Xu, Yufeng Chen, Jie Zhou

    Abstract: Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences. However, current LT strategies cannot indicate the desired target language as expected on zero-shot translation, i.e., the off-target issue. Our analysis reveals that the indication of the target language is sensitive to the placement of t… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024 Findings, Codes are at https://github.com/Acerkoo/LCS

  25. arXiv:2406.01721  [pdf, other

    cs.CL

    Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs

    Authors: Haokun Lin, Haobo Xu, Yichen Wu, **gzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei

    Abstract: Quantizing large language models (LLMs) presents significant challenges, primarily due to outlier activations that compromise the efficiency of low-bit representation. Traditional approaches mainly focus on solving Normal Outliers-activations with consistently high magnitudes across all tokens. However, these techniques falter when dealing with Massive Outliers, which are significantly higher in v… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 26 pages, 13 figures

  26. arXiv:2406.01391  [pdf, other

    astro-ph.IM cs.DL

    Knowledge Graph in Astronomical Research with Large Language Models: Quantifying Driving Forces in Interdisciplinary Scientific Discovery

    Authors: Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai

    Abstract: Identifying and predicting the factors that contribute to the success of interdisciplinary research is crucial for advancing scientific discovery. However, there is a lack of methods to quantify the integration of new ideas and technological advancements in astronomical research and how these new technologies drive further scientific breakthroughs. Large language models, with their ability to extr… ▽ More

    Submitted 15 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: An interactive version of the knowledge graph is made publicly available at https://astrokg.github.io/. Accepted to IJCAI 2024 AI4Research Workshop. Comments are welcome

  27. arXiv:2406.00335  [pdf, other

    cs.LG

    Benchmarking for Deep Uplift Modeling in Online Marketing

    Authors: Dugang Liu, Xing Tang, Yang Qiao, Miao Liu, Zexu Sun, Xiuqiang He, Zhong Ming

    Abstract: Online marketing is critical for many industrial platforms and business applications, aiming to increase user engagement and platform revenue by identifying corresponding delivery-sensitive groups for specific incentives, such as coupons and bonuses. As the scale and complexity of features in industrial scenarios increase, deep uplift modeling (DUM) as a promising technique has attracted increased… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  28. arXiv:2406.00093  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    Bootstrap3D: Improving 3D Content Creation with Synthetic Data

    Authors: Zeyi Sun, Tong Wu, Pan Zhang, Yuhang Zang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

    Abstract: Recent years have witnessed remarkable progress in multi-view diffusion models for 3D content creation. However, there remains a significant gap in image quality and prompt-following ability compared to 2D diffusion models. A critical bottleneck is the scarcity of high-quality 3D assets with detailed captions. To address this challenge, we propose Bootstrap3D, a novel framework that automatically… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Project Page: https://sunzey.github.io/Bootstrap3D/

  29. arXiv:2406.00023  [pdf, other

    cs.CL

    LocMoE+: Enhanced Router with Token Feature Awareness for Efficient LLM Pre-Training

    Authors: **g Li, Zhijie Sun, Dachao Lin, Xuan He, Yi Lin, Binfan Zheng, Li Zeng, Rongqian Zhao, Xin Chen

    Abstract: Mixture-of-Experts (MoE) architectures have recently gained increasing popularity within the domain of large language models (LLMs) due to their ability to significantly reduce training and inference overhead. However, MoE architectures face challenges, such as significant disparities in the number of tokens assigned to each expert and a tendency toward homogenization among experts, which adversel… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  30. GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis

    Authors: Boming Zhao, Yuan Li, Ziyu Sun, Lin Zeng, Yujun Shen, Rui Ma, Yinda Zhang, Hujun Bao, Zhaopeng Cui

    Abstract: Forecasting future scenarios in dynamic environments is essential for intelligent decision-making and navigation, a challenge yet to be fully realized in computer vision and robotics. Traditional approaches like video prediction and novel-view synthesis either lack the ability to forecast from arbitrary viewpoints or to predict temporal dynamics. In this paper, we introduce GaussianPrediction, a n… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGGRAPH 2024 Conference. Project Page: https://zju3dv.github.io/gaussian-prediction/

  31. arXiv:2405.19567  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding

    Authors: Shenghuan Sun, Gregory M. Goldgof, Alexander Schubert, Zhiqing Sun, Thomas Hartvigsen, Atul J. Butte, Ahmed Alaa

    Abstract: Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VL… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Code available at: https://github.com/AlaaLab/Dr-LLaVA

  32. arXiv:2405.19257  [pdf, other

    cs.RO cs.DC

    Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots

    Authors: Zekai Sun, Xiuxian Guan, Junming Wang, Haoze Song, Yuhao Qing, Tianxiang Shen, Dong Huang, Fangming Liu, Heming Cui

    Abstract: The rapid advancements in machine learning techniques have led to significant achievements in various real-world robotic tasks. These tasks heavily rely on fast and energy-efficient inference of deep neural network (DNN) models when deployed on robots. To enhance inference performance, distributed inference has emerged as a promising approach, parallelizing inference across multiple powerful GPU d… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  33. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  34. arXiv:2405.17240  [pdf, other

    cs.CV

    Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth

    Authors: Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen, Yi Rong

    Abstract: The absence of real targets to guide the model training is one of the main problems with the makeup transfer task. Most existing methods tackle this problem by synthesizing pseudo ground truths (PGTs). However, the generated PGTs are often sub-optimal and their imprecision will eventually lead to performance degradation. To alleviate this issue, in this paper, we propose a novel Content-Style Deco… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  35. arXiv:2405.15505  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Counterfactual Regression through the Lens of Gromov-Wasserstein Information Bottleneck

    Authors: Hao Yang, Zexu Sun, Hongteng Xu, Xu Chen

    Abstract: As a promising individualized treatment effect (ITE) estimation method, counterfactual regression (CFR) maps individuals' covariates to a latent space and predicts their counterfactual outcomes. However, the selection bias between control and treatment groups often imbalances the two groups' latent distributions and negatively impacts this method's performance. In this study, we revisit counterfac… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 19 pages

  36. arXiv:2405.15301  [pdf, other

    cs.LG

    Rankability-enhanced Revenue Uplift Modeling Framework for Online Marketing

    Authors: Bowei He, Yunpeng Weng, Xing Tang, Ziqiang Cui, Zexu Sun, Liang Chen, Xiuqiang He, Chen Ma

    Abstract: Uplift modeling has been widely employed in online marketing by predicting the response difference between the treatment and control groups, so as to identify the sensitive individuals toward interventions like coupons or discounts. Compared with traditional \textit{conversion uplift modeling}, \textit{revenue uplift modeling} exhibits higher potential due to its direct connection with the corpora… ▽ More

    Submitted 12 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  37. arXiv:2405.14677  [pdf, other

    cs.CV cs.LG

    RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

    Authors: Zhicheng Sun, Zhenhao Yang, Yang **, Haozhe Chi, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang Song, Kun Gai, Yadong Mu

    Abstract: Customizing diffusion models to generate identity-preserving images from user-provided reference images is an intriguing new problem. The prevalent approaches typically require training on extensive domain-specific images to achieve identity preservation, which lacks flexibility across different use cases. To address this issue, we exploit classifier guidance, a training-free technique that steers… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  38. arXiv:2405.12872  [pdf, other

    eess.IV cs.CV

    Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image

    Authors: Zerui Zhang, Zhichao Sun, Zelong Liu, Bo Du, Rui Yu, Zhou Zhao, Yongchao Xu

    Abstract: Medical anomaly detection is a critical research area aimed at recognizing abnormal images to aid in diagnosis.Most existing methods adopt synthetic anomalies and image restoration on normal samples to detect anomaly. The unlabeled data consisting of both normal and abnormal data is not well explored. We introduce a novel Spatial-aware Attention Generative Adversarial Network (SAGAN) for one-class… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Early Accept by MICCAI 2024

  39. arXiv:2405.11976  [pdf, other

    cs.CV

    Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

    Authors: Zhichao Sun, Yuliang Gu, Yepeng Liu, Zerui Zhang, Zhou Zhao, Yongchao Xu

    Abstract: Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based m… ▽ More

    Submitted 19 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: MICCAI 2024 Early Accept

  40. arXiv:2405.11465  [pdf, other

    cs.CL

    Effective In-Context Example Selection through Data Compression

    Authors: Zhongxiang Sun, Kepu Zhang, Haoyu Wang, Xiao Zhang, Jun Xu

    Abstract: In-context learning has been extensively validated in large language models. However, the mechanism and selection strategy for in-context example selection, which is a crucial ingredient in this approach, lacks systematic and in-depth research. In this paper, we propose a data compression approach to the selection of in-context examples. We introduce a two-stage method that can effectively choose… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024 finding

  41. arXiv:2405.11265  [pdf, other

    cs.CL cs.AI

    EnviroExam: Benchmarking Environmental Science Knowledge of Large Language Models

    Authors: Yu Huang, Liang Guo, Wanqian Guo, Zhe Tao, Yang Lv, Zhihao Sun, Dongfang Zhao

    Abstract: In the field of environmental science, it is crucial to have robust evaluation metrics for large language models to ensure their efficacy and accuracy. We propose EnviroExam, a comprehensive evaluation method designed to assess the knowledge of large language models in the field of environmental science. EnviroExam is based on the curricula of top international universities, covering undergraduate… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  42. arXiv:2405.09783  [pdf, other

    cs.LG cs.AI cs.CE

    LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

    Authors: **chuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik

    Abstract: Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulati… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  43. arXiv:2405.08299  [pdf, other

    cs.CR cs.LG

    Differentially Private Federated Learning: A Systematic Review

    Authors: Jie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang Cao

    Abstract: In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within feder… ▽ More

    Submitted 19 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 36pages

  44. arXiv:2405.06214  [pdf, other

    cs.CV

    Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

    Authors: Xiaohan Zhang, Yukui Qiu, Zhenyu Sun, Qi Liu

    Abstract: Recent progress in large-scale scene rendering has yielded Neural Radiance Fields (NeRF)-based models with an impressive ability to synthesize scenes across small objects and indoor scenes. Nevertheless, extending this idea to large-scale aerial rendering poses two critical problems. Firstly, a single NeRF cannot render the entire scene with high-precision for complex large-scale aerial datasets s… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  45. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, **g Ye, Wen**g Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, **fu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  46. arXiv:2405.05514  [pdf, other

    cs.RO

    HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

    Authors: Zhirui Sun, Zhe Zhang, Jieting Zhao, Han**g Ye, Jiankun Wang

    Abstract: The robotic autonomous luggage trolley collection system employs robots to gather and transport scattered luggage trolleys at airports. However, existing methods for detecting and locating these luggage trolleys often fail when they are not fully visible. To address this, we introduce the Hierarchical Progressive Perception System (HPPS), which enhances the detection and localization of luggage tr… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  47. arXiv:2405.04902  [pdf, other

    eess.IV cs.CV

    HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou, Longteng Kong, Yu Chen, Yi Li, Zhenan Sun, Caifeng Shan

    Abstract: Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Ad… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  48. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhi**g Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Hai** Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  49. arXiv:2405.03809  [pdf, other

    cs.AI

    SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction

    Authors: Zixu Wang, Zhigang Sun, Juergen Luettin, Lavdim Halilaj

    Abstract: Accurate trajectory prediction is crucial for ensuring safe and efficient autonomous driving. However, most existing methods overlook complex interactions between traffic participants that often govern their future trajectories. In this paper, we propose SocialFormer, an agent interaction-aware trajectory prediction method that leverages the semantic relationship between the target vehicle and sur… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  50. arXiv:2405.01327  [pdf, other

    cs.LG

    Constrained Reinforcement Learning Under Model Mismatch

    Authors: Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou

    Abstract: Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constr… ▽ More

    Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: ICML 2024