Skip to main content

Showing 1–50 of 300 results for author: Luo, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02034  [pdf, other

    cs.CV

    TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation

    Authors: Chaofan Luo, Donglin Di, Yongjia Ma, Zhou Xue, Chen Wei, Xun Yang, Yebin Liu

    Abstract: Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01903  [pdf, other

    cs.LG cs.AI cs.CV

    Text-Aware Diffusion for Policy Learning

    Authors: Calvin Luo, Mandy He, Zilai Zeng, Chen Sun

    Abstract: Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. Multi-agent Cooperative Games Using Belief Map Assisted Training

    Authors: Qinwei Huang, Chen Luo, Alex B. Wu, Simon Khan, Hai Li, Qinru Qiu

    Abstract: In a multi-agent system, agents share their local observations to gain global situational awareness for decision making and collaboration using a message passing system. When to send a message, how to encode a message, and how to leverage the received messages directly affect the effectiveness of the collaboration among agents. When training a multi-agent cooperative game using reinforcement learn… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Journal ref: ECAI 2023. IOS Press, 2023: 1617-1624

  4. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang **g, Haining Tan, **g** Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.18394  [pdf, other

    q-fin.CP cs.AI

    AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors

    Authors: Hao Shi, Cuicui Luo, Weili Song, Xinting Zhang, Xiang Ao

    Abstract: The variability and low signal-to-noise ratio in financial data, combined with the necessity for interpretability, make the alpha factor mining workflow a crucial component of quantitative investment. Transitioning from early manual extraction to genetic programming, the most advanced approach in this domain currently employs reinforcement learning to mine a set of combination factors with fixed w… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  6. CAT: Interpretable Concept-based Taylor Additive Models

    Authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Hongjue Zhao, Chenxiang Luo, Eric Zavesky, Huaxiu Yao, Huajie Shao

    Abstract: As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.17517  [pdf, other

    cs.LG cs.AI

    Preserving Node Distinctness in Graph Autoencoders via Similarity Distillation

    Authors: Ge Chen, Yulan Hu, Sheng Ouyang, Yong Liu, Cuicui Luo

    Abstract: Graph autoencoders (GAEs), as a kind of generative self-supervised learning approach, have shown great potential in recent years. GAEs typically rely on distance-based criteria, such as mean-square-error (MSE), to reconstruct the input graph. However, relying solely on a single reconstruction criterion may lead to a loss of distinctiveness in the reconstructed graph, causing nodes to collapse into… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.09397  [pdf, other

    cs.CV cs.AI

    Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

    Authors: Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

    Abstract: Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 28 pages, 26 figures, under review

  9. arXiv:2406.03464  [pdf, other

    cs.LG

    Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach

    Authors: Haoyu Han, Juanhui Li, Wei Huang, Xianfeng Tang, Hanqing Lu, Chen Luo, Hui Liu, Jiliang Tang

    Abstract: Graph Neural Networks (GNNs) have proven to be highly effective for node classification tasks across diverse graph structural patterns. Traditionally, GNNs employ a uniform global filter, typically a low-pass filter for homophilic graphs and a high-pass filter for heterophilic graphs. However, real-world graphs often exhibit a complex mix of homophilic and heterophilic patterns, rendering a single… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  10. arXiv:2406.01047  [pdf, other

    cs.DC cs.AI cs.LG

    An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

    Authors: Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda

    Abstract: Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  11. arXiv:2405.20700  [pdf, other

    cs.AI

    Self-degraded contrastive domain adaptation for industrial fault diagnosis with bi-imbalanced data

    Authors: Gecheng Chen, Zeyu Yang, Chengwen Luo, Jianqiang Li

    Abstract: Modern industrial fault diagnosis tasks often face the combined challenge of distribution discrepancy and bi-imbalance. Existing domain adaptation approaches pay little attention to the prevailing bi-imbalance, leading to poor domain adaptation performance or even negative transfer. In this work, we propose a self-degraded contrastive domain adaptation (Sd-CDA) diagnosis framework to handle the do… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2405.20325  [pdf, other

    cs.CV

    MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

    Authors: Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Despite impressive advancements in diffusion-based video editing models in altering video attributes, there has been limited exploration into modifying motion information while preserving the original protagonist's appearance and background. In this paper, we propose MotionFollower, a lightweight score-guided diffusion model for video motion editing. To introduce conditional controls to the denois… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 18 figures. Project page at https://francis-rings.github.io/MotionFollower/

    MSC Class: 68T45; 68T10

  13. arXiv:2405.19491  [pdf, other

    cs.CE

    Calibration and Validation of a Phase-Field Model of Brittle Fracture within the Damage Mechanics Challenge

    Authors: Jonas Heinzmann, Pietro Carrara, Chenyi Luo, Manav Manav, Akanksha Mishra, Sindhu Nagaraja, Hamza Oudich, Francesco Vicentini, Laura De Lorenzis

    Abstract: In the context of the Damage Mechanics Challenge, we adopt a phase-field model of brittle fracture to blindly predict the behavior up to failure of a notched three-point-bending specimen loaded under mixed-mode conditions. The beam is additively manufactured using a geo-architected gypsum based on the combination of bassanite and a water-based binder. The calibration of the material parameters inv… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  14. arXiv:2405.14206  [pdf, other

    cs.CV

    LG-VQ: Language-Guided Codebook Learning

    Authors: Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

    Abstract: Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal per… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: None

  15. arXiv:2405.05691  [pdf, other

    cs.CV cs.MM

    StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework

    Authors: Yiheng Huang, Hui Yang, Chuanchen Luo, Yuxi Wang, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng

    Abstract: Thanks to the powerful generative capacity of diffusion models, recent years have witnessed rapid progress in human motion generation. Existing diffusion-based methods employ disparate network architectures and training strategies. The effect of the design of each component is still unclear. In addition, the iterative denoising process consumes considerable computational overhead, which is prohibi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  16. arXiv:2405.04756  [pdf, other

    cs.CL cs.LG

    BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

    Authors: Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak

    Abstract: Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective the… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  17. arXiv:2405.02586  [pdf, other

    cs.CV

    Generalizing CLIP to Unseen Domain via Text-Guided Diverse Novel Feature Synthesis

    Authors: Siyuan Yan, Cheng Luo, Zhen Yu, Zongyuan Ge

    Abstract: Vision-language foundation models like CLIP have shown impressive zero-shot generalization, but finetuning on downstream datasets can cause overfitting and loss of its generalization ability on unseen domains. Although collecting additional data from new domains of interest is possible, this method is often impractical due to the challenges in obtaining annotated data. To address this, we propose… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 24 pages

  18. arXiv:2404.18459  [pdf, other

    cs.CV

    Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild

    Authors: Donggyun Kim, Seongwoong Cho, Semin Kim, Chong Luo, Seunghoon Hong

    Abstract: Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct challenge due to the variation in label structures across different tasks. Consequently, generalization to unseen dense prediction tasks in the low-data regime… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  19. arXiv:2404.17186  [pdf, other

    cs.CV cs.AI cs.LG

    MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information

    Authors: Jiajun Liang, Baoquan Zhang, Yunming Ye, Xutao Li, Chuyao Luo, Xukai Fu

    Abstract: The accurate detection of Mesoscale Convective Systems (MCS) is crucial for meteorological monitoring due to their potential to cause significant destruction through severe weather phenomena such as hail, thunderstorms, and heavy rainfall. However, the existing methods for MCS detection mostly targets on single-frame detection, which just considers the static characteristics and ignores the tempor… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  20. arXiv:2404.16650  [pdf

    cs.CE

    Design optimization of advanced tow-steered composites with manufacturing constraints

    Authors: Chuan Luo, Federico Ferrari, James K. Guest

    Abstract: Tow steering technologies, such as Automated fiber placement, enable the fabrication of composite laminates with curvilinear fiber, tow, or tape paths. Designers may therefore tailor tow orientations locally according to the expected local stress state within a structure, such that strong and stiff orientations of the tow are (for example) optimized to provide maximal mechanical benefit. Tow path… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 29 pages, 16 figures

  21. arXiv:2404.14600  [pdf, other

    cs.IR cs.CL

    Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding

    Authors: Hansi Zeng, Chen Luo, Hamed Zamani

    Abstract: This paper introduces PAG-a novel optimization and decoding approach that guides autoregressive generation of document identifiers in generative retrieval models through simultaneous decoding. To this aim, PAG constructs a set-based and sequential identifier for each document. Motivated by the bag-of-words assumption in information retrieval, the set-based identifier is built on lexical tokens. Th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to SIGIR 2024

  22. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  23. arXiv:2404.13923  [pdf, other

    cs.CV

    MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

    Authors: Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu Man Zhang, Qing Li, Xucheng Yin, Zhaoxiang Zhang, Junran Peng

    Abstract: Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture.… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  24. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  25. arXiv:2404.06443  [pdf, other

    cs.CV

    Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition

    Authors: Zihan Wang, Siyang Song, Cheng Luo, Songhe Deng, Weicheng Xie, Linlin Shen

    Abstract: Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper propos… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  26. arXiv:2404.05225  [pdf, other

    cs.CV cs.CL

    LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

    Authors: Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao

    Abstract: Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored and utilized the document layout information, which is vital for precise document understanding. In this paper, we propose LayoutLLM, an LLM/MLLM bas… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  27. arXiv:2404.04292  [pdf, other

    cs.CL cs.AI

    Conversational Disease Diagnosis via External Planner-Controlled Large Language Models

    Authors: Zhoujian Sun, Cheng Luo, Ziyi Liu, Zhengxing Huang

    Abstract: The development of large language models (LLMs) has brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnostic scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a LLM-based diagnostic system that enhances planning capabilities by emulati… ▽ More

    Submitted 19 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  28. arXiv:2404.01133  [pdf, other

    cs.CV

    CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

    Authors: Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Junran Peng, Zhaoxiang Zhang

    Abstract: The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strate… ▽ More

    Submitted 7 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Project Page: https://dekuliutesla.github.io/citygs/

  29. arXiv:2404.00021  [pdf, other

    cs.HC cs.CE cs.CY cs.PF

    Evaluatology: The Science and Engineering of Evaluation

    Authors: Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang

    Abstract: Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science… ▽ More

    Submitted 19 March, 2024; originally announced April 2024.

    Comments: 29 pages, 16 figures, and 2 tables

  30. arXiv:2403.18341  [pdf, other

    cs.CL

    IterAlign: Iterative Constitutional Alignment of Large Language Models

    Authors: Xiusi Chen, Hongzhi Wen, Sreyashi Nag, Chen Luo, Qingyu Yin, Ruirui Li, Zheng Li, Wei Wang

    Abstract: With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are l… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  31. arXiv:2403.17935  [pdf, other

    cs.CV

    OmniVid: A Generative Framework for Universal Video Understanding

    Authors: Junke Wang, Dongdong Chen, Chong Luo, Bo He, Lu Yuan, Zuxuan Wu, Yu-Gang Jiang

    Abstract: The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution. Despite sharing a common goal, different tasks often rely on distinct model architectures and annotation formats. In contrast, natural language processing benefits from a unified output space, i.e., text sequences, whic… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  32. arXiv:2403.15698  [pdf, other

    cs.CV cs.AI

    SceneX:Procedural Controllable Large-scale Scene Generation via Large-language Models

    Authors: Mengqi Zhou, Jun Hou, Chuanchen Luo, Yuxi Wang, Zhaoxiang Zhang, Junran Peng

    Abstract: Due to its great application potential, large-scale scene generation has drawn extensive attention in academia and industry. Recent research employs powerful generative models to create desired scenes and achieves promising results. However, most of these methods represent the scene using 3D primitives (e.g. point cloud or radiance field) incompatible with the industrial pipeline, which leads to a… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  33. arXiv:2403.13512  [pdf, other

    cs.CV cs.AI

    Scale Decoupled Distillation

    Authors: Shicai Wei Chunbo Luo Yang Luo

    Abstract: Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowled… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024 10 pages 6figure

  34. arXiv:2403.09622  [pdf, other

    cs.CV

    Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

    Authors: Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan

    Abstract: Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the ch… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: technical report, 18 pages, 19 figures

  35. arXiv:2403.09236  [pdf, other

    cs.CV

    Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph

    Authors: Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao

    Abstract: Text-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 27 pages, 14 figures

  36. arXiv:2403.06568  [pdf, other

    cs.AI

    Better Understandings and Configurations in MaxSAT Local Search Solvers via Anytime Performance Analysis

    Authors: Furong Ye, Chuan Luo, Shaowei Cai

    Abstract: Though numerous solvers have been proposed for the MaxSAT problem, and the benchmark environment such as MaxSAT Evaluations provides a platform for the comparison of the state-of-the-art solvers, existing assessments were usually evaluated based on the quality, e.g., fitness, of the best-found solutions obtained within a given running time budget. However, concerning solely the final obtained solu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  37. arXiv:2403.05268  [pdf, ps, other

    cs.CL cs.LG

    Deep Prompt Multi-task Network for Abuse Language Detection

    Authors: Jian Zhu, Yu** Ruan, **gfei Chang, Wenhui Sun, Hui Wan, Jian Long, Cheng Luo

    Abstract: The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PL… ▽ More

    Submitted 24 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by the International Conference on Pattern Recognition (ICPR) 2024

  38. arXiv:2403.03681  [pdf, other

    cs.RO cs.CV

    3D Object Visibility Prediction in Autonomous Driving

    Authors: Chuanyu Luo, Nuo Cheng, Ren Zhong, Haipeng Jiang, Wenyu Chen, Aoli Wang, Pu Li

    Abstract: With the rapid advancement of hardware and software technologies, research in autonomous driving has seen significant growth. The prevailing framework for multi-sensor autonomous driving encompasses sensor installation, perception, path planning, decision-making, and motion control. At the perception phase, a common approach involves utilizing neural networks to infer 3D bounding box (Bbox) attrib… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  39. arXiv:2402.19350  [pdf, other

    cs.CL

    Prompting Explicit and Implicit Knowledge for Multi-hop Question Answering Based on Human Reading Process

    Authors: Guangming Huang, Yunfei Long, Cun** Luo, Jiaxing Shen, Xia Sun

    Abstract: Pre-trained language models (PLMs) leverage chains-of-thought (CoT) to simulate human reasoning and inference processes, achieving proficient performance in multi-hop QA. However, a gap persists between PLMs' reasoning abilities and those of humans when tackling complex problems. Psychological studies suggest a vital connection between explicit information in passages and human prior knowledge dur… ▽ More

    Submitted 27 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted at COLING 2024

  40. arXiv:2402.18094  [pdf, other

    cs.IT

    On the Existence of Cyclic Lattice Codes

    Authors: Chengpin Luo, Brian M. Kurkoski

    Abstract: A coding lattice $Λ_c$ and a sha** lattice $Λ_s$ forms a nested lattice code $\mathcal{C}$ if $Λ_s \subseteq Λ_c$. Under some conditions, $\mathcal{C}$ is a finite cyclic group formed by rectangular encoding. This paper presents the conditions for the existence of such $\mathcal{C}$ and provides some designs. These designs correspond to solutions to linear Diophantine equations so that a cyclic… ▽ More

    Submitted 9 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 5 pages, isit 2024 conference

  41. arXiv:2402.16299  [pdf, other

    cs.IR cs.LG

    Against Filter Bubbles: Diversified Music Recommendation via Weighted Hypergraph Embedding Learning

    Authors: Chaoguang Luo, Liuying Wen, Yong Qin, Liangwei Yang, Zhineng Hu, Philip S. Yu

    Abstract: Recommender systems serve a dual purpose for users: sifting out inappropriate or mismatched information while accurately identifying items that align with their preferences. Numerous recommendation algorithms are designed to provide users with a personalized array of information tailored to their preferences. Nevertheless, excessive personalization can confine users within a "filter bubble". Conse… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  42. arXiv:2402.07710  [pdf, other

    cs.LG cs.CV

    Optimizing Sparse Convolution on GPUs with CUDA for 3D Point Cloud Processing in Embedded Systems

    Authors: Chester Luo, Kevin Lai

    Abstract: In recent years, there has been a significant increase in the utilization of deep learning methods, particularly convolutional neural networks (CNNs), which have emerged as the dominant approach in various domains that involve structured grid data, such as picture analysis and processing. Nevertheless, the exponential growth in the utilization of LiDAR and 3D sensors across many domains has result… ▽ More

    Submitted 6 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: 9 pages

  43. arXiv:2402.07087  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Self-Correcting Self-Consuming Loops for Generative Model Training

    Authors: Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, Chen Sun

    Abstract: As synthetic data becomes higher quality and proliferates on the internet, machine learning models are increasingly trained on a mix of human- and machine-generated data. Despite the successful stories of using synthetic data for representation learning, using synthetic data for generative model training creates "self-consuming loops" which may lead to training instability or even collapse, unless… ▽ More

    Submitted 10 June, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

    Comments: Camera ready version (ICML 2024). Code at https://nategillman.com/sc-sc.html

  44. arXiv:2402.03951  [pdf, other

    cs.CV cs.AI

    Boosting Adversarial Transferability across Model Genus by Deformation-Constrained War**

    Authors: Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song

    Abstract: Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propos… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: AAAI 2024

  45. arXiv:2402.03379  [pdf, other

    cs.IR cs.AI cs.LG

    Entire Chain Uplift Modeling with Context-Enhanced Learning for Intelligent Marketing

    Authors: Yinqiu Huang, Shuli Wang, Min Gao, Xue Wei, Changhao Li, Chuan Luo, Yinhua Zhu, Xiong Xiao, Yi Luo

    Abstract: Uplift modeling, vital in online marketing, seeks to accurately measure the impact of various strategies, such as coupons or discounts, on different users by predicting the Individual Treatment Effect (ITE). In an e-commerce setting, user behavior follows a defined sequential chain, including impression, click, and conversion. Marketing strategies exert varied uplift effects at each stage within t… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: Accepted by WWW2024

  46. arXiv:2401.08690  [pdf, other

    cs.LG

    Contrastive Learning with Negative Sampling Correction

    Authors: Lu Wang, Chao Du, Pu Zhao, Chuan Luo, Zhangchi Zhu, Bo Qiao, Wei Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: As one of the most effective self-supervised representation learning methods, contrastive learning (CL) relies on multiple negative pairs to contrast against each positive pair. In the standard practice of contrastive learning, data augmentation methods are utilized to generate both positive and negative pairs. While existing works have been focusing on improving the positive sampling, the negativ… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

    Comments: 9 pages, 3 figures

  47. arXiv:2401.06614  [pdf, other

    cs.CV

    Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking

    Authors: Wei Cao, Chang Luo, Biao Zhang, Matthias Nießner, Jiapeng Tang

    Abstract: We introduce Motion2VecSets, a 4D diffusion model for dynamic surface reconstruction from point cloud sequences. While existing state-of-the-art methods have demonstrated success in reconstructing non-rigid objects using neural field representations, conventional feed-forward networks encounter challenges with ambiguous observations from noisy, partial, or sparse point clouds. To address these cha… ▽ More

    Submitted 13 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  48. arXiv:2401.05166  [pdf, other

    cs.CV

    REACT 2024: the Second Multiple Appropriate Facial Reaction Generation Challenge

    Authors: Siyang Song, Micol Spitale, Cheng Luo, Cristina Palmero, German Barquero, Hengde Zhu, Sergio Escalera, Michel Valstar, Tobias Baur, Fabien Ringeval, Elisabeth Andre, Hatice Gunes

    Abstract: In dyadic interactions, humans communicate their intentions and state of mind using verbal and non-verbal cues, where multiple different facial reactions might be appropriate in response to a specific speaker behaviour. Then, how to develop a machine learning (ML) model that can automatically generate multiple appropriate, diverse, realistic and synchronised human facial reactions from an previous… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    MSC Class: 68T40

  49. arXiv:2401.03470  [pdf, other

    cs.CV cs.AI

    FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing Scenes

    Authors: Genghao Zhang, Yuxi Wang, Chuanchen Luo, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng

    Abstract: Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design. Current indoor scene generation methods can produce reasonable room layouts but often lack diversity and realism. This is primarily due to the limited coverage of existing datasets, including only large furniture without tiny furnishings in daily l… ▽ More

    Submitted 6 May, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  50. arXiv:2401.01651  [pdf, other

    cs.CV cs.AI

    AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI

    Authors: Fanda Fan, Chunjie Luo, Wanling Gao, Jianfeng Zhan

    Abstract: The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, whi… ▽ More

    Submitted 23 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench)