Skip to main content

Showing 1–50 of 634 results for author: Yang, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01445  [pdf, other

    cs.LG cs.CV

    FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

    Authors: Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang

    Abstract: Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 23 pages

  2. arXiv:2406.18832  [pdf, other

    cs.CL

    OutlierTune: Efficient Channel-Wise Quantization for Large Language Models

    Authors: **guang Wang, Yuexi Yin, Haifeng Sun, Qi Qi, **gyu Wang, Zirui Zhuang, Tingting Yang, Jianxin Liao

    Abstract: Quantizing the activations of large language models (LLMs) has been a significant challenge due to the presence of structured outliers. Most existing methods focus on the per-token or per-tensor quantization of activations, making it difficult to achieve both accuracy and hardware efficiency. To address this problem, we propose OutlierTune, an efficient per-channel post-training quantization (PTQ)… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.17894  [pdf, other

    cs.LG

    Efficient and Effective Implicit Dynamic Graph Neural Network

    Authors: Yongjian Zhong, Hieu Vu, Tianbao Yang, Bijaya Adhikari

    Abstract: Implicit graph neural networks have gained popularity in recent years as they capture long-range dependencies while improving predictive performance in static graphs. Despite the tussle between performance degradation due to the oversmoothing of learned embeddings and long-range dependency being more pronounced in dynamic graphs, as features are aggregated both across neighborhood and time, no pri… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  4. arXiv:2406.16933  [pdf, other

    eess.SP cs.AI

    SGSM: A Foundation-model-like Semi-generalist Sensing Model

    Authors: Tianjian Yang, Hao Zhou, Shuo Liu, Kaiwen Guo, Yiwen Hou, Haohua Du, Zhi Liu, Xiang-Yang Li

    Abstract: The significance of intelligent sensing systems is growing in the realm of smart services. These systems extract relevant signal features and generate informative representations for particular tasks. However, building the feature extraction component for such systems requires extensive domain-specific expertise or data. The exceptionally rapid development of foundation models is likely to usher i… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  5. arXiv:2406.16007  [pdf, other

    cs.CL

    Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

    Authors: Bowen Zheng, Ming Ma, Zhongqiao Lin, Tianming Yang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL). With ICL, LLMs can derive the underlying rule from a few demonstrations and provide answers that comply with the rule. Previous work hypothesized that the network creates a "task vector" in specific positions during ICL. Patching the "task vector" allows LLMs to achieve z… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.12577  [pdf, other

    cs.CV

    Cephalometric Landmark Detection across Ages with Prototypical Network

    Authors: Han Wu, Chong Wang, Lanzhuju Mei, Tong Yang, Min Zhu, Dingggang Shen, Zhiming Cui

    Abstract: Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  7. arXiv:2406.11802  [pdf, other

    cs.CV

    PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

    Authors: Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, ** Luo

    Abstract: Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal know… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  9. arXiv:2406.06737  [pdf, other

    cs.CR cs.CL

    Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

    Authors: Junlin Wang, Tianyi Yang, Roy Xie, Bhuwan Dhingra

    Abstract: With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extra… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  10. arXiv:2406.06039  [pdf, other

    cs.CV

    Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

    Authors: Shijie Lian, Ziyi Zhang, Hua Li, Wenjie Li, Laurence Tianruo Yang, Sam Kwong, Runmin Cong

    Abstract: With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreov… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024, Code released at: https://github.com/LiamLian0727/USIS10K

  11. arXiv:2406.05977  [pdf, other

    cs.IR

    Weighted KL-Divergence for Document Ranking Model Refinement

    Authors: Yingrui Yang, Yifan Qiao, Shanxiu He, Tao Yang

    Abstract: Transformer-based retrieval and reranking models for text document search are often refined through knowledge distillation together with contrastive learning. A tight distribution matching between the teacher and student models can be hard as over-calibration may degrade training effectiveness when a teacher does not perform well. This paper contrastively reweights KL divergence terms to prioritiz… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  12. arXiv:2406.05686  [pdf, other

    cs.LG cs.CV cs.CY

    Provable Optimization for Adversarial Fair Self-supervised Contrastive Learning

    Authors: Qi Qi, Quanqi Hu, Qihang Lin, Tianbao Yang

    Abstract: This paper studies learning fair encoders in a self-supervised learning (SSL) setting, in which all data are unlabeled and only a small portion of them are annotated with sensitive attribute. Adversarial fair representation learning is well suited for this scenario by minimizing a contrastive loss over unlabeled data while maximizing an adversarial loss of predicting the sensitive attribute over… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  13. arXiv:2406.02884  [pdf, other

    cs.CV

    PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

    Authors: Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

    Abstract: Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages; typos corrected, appendix added

  14. arXiv:2406.02764  [pdf, other

    cs.LG cs.AI

    Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

    Authors: Ilgee Hong, Zichong Li, Alexander Bukharin, Yixiao Li, Haoming Jiang, Tianbao Yang, Tuo Zhao

    Abstract: Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptiv… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  15. arXiv:2406.00376  [pdf, other

    cs.DS cs.DB

    Approaching 100% Confidence in Stream Summary through ReliableSketch

    Authors: Yuhan Wu, Hanbo Wu, Xilai Liu, Yikai Zhao, Tong Yang, Kaicheng Yang, Sha Wang, Lihua Miao, Gaogang Xie

    Abstract: To approximate sums of values in key-value data streams, sketches are widely used in databases and networking systems. They offer high-confidence approximations for any given key while ensuring low time and space overhead. While existing sketches are proficient in estimating individual keys, they struggle to maintain this high confidence across all keys collectively, an objective that is criticall… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  16. arXiv:2405.20750  [pdf, other

    cs.CV

    Diffusion Models Are Innate One-Step Generators

    Authors: Bowen Zheng, Tianming Yang

    Abstract: Diffusion Models (DMs) have achieved great success in image generation and other fields. By fine sampling through the trajectory defined by the SDE/ODE solver based on a well-trained score model, DMs can generate remarkable high-quality results. However, this precise sampling often requires multiple steps and is computationally demanding. To address this problem, instance-based distillation method… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures and 4 tables on the main contents

  17. arXiv:2405.19711  [pdf

    cs.DS

    SimiSketch: Efficiently Estimating Similarity of streaming Multisets

    Authors: Fenghao Dong, Yang He, Yutong Liang, Zirui Liu, Yuhan Wu, Peiqing Chen, Tong Yang

    Abstract: The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around hashing techniques, which are well-suited for sets but less naturally adaptable to multisets, a common occurrence in scenarios like network streams and text data. Mo… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  18. arXiv:2405.19320  [pdf, other

    cs.LG cs.AI stat.ML

    Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

    Authors: Shicong Cen, **cheng Mei, Katayoon Goshvadi, Hanjun Dai, Tong Yang, Sherry Yang, Dale Schuurmans, Yuejie Chi, Bo Dai

    Abstract: Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF,… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  19. arXiv:2405.18577  [pdf, other

    math.OC cs.LG stat.ML

    Single-loop Stochastic Algorithms for Difference of Max-Structured Weakly Convex Functions

    Authors: Quanqi Hu, Qi Qi, Zhaosong Lu, Tianbao Yang

    Abstract: In this paper, we study a class of non-smooth non-convex problems in the form of $\min_{x}[\max_{y\in Y}φ(x, y) - \max_{z\in Z}ψ(x, z)]$, where both $Φ(x) = \max_{y\in Y}φ(x, y)$ and $Ψ(x)=\max_{z\in Z}ψ(x, z)$ are weakly convex functions, and $φ(x, y), ψ(x, z)$ are strongly concave functions in terms of $y$ and $z$, respectively. It covers two families of problems that have been studied but are m… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  20. arXiv:2405.17470  [pdf, other

    cs.LG cs.AI cs.CL

    Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information

    Authors: Yanshu Wang, Wenyang He, Tong Yang

    Abstract: Large Language Models (LLMs) have significantly advanced natural language processing tasks such as machine translation, text generation, and sentiment analysis. However, their large size, often consisting of billions of parameters, poses challenges for storage, computation, and deployment, particularly in resource-constrained environments like mobile devices and edge computing platforms. Effective… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  21. arXiv:2405.16577  [pdf, other

    stat.ML cs.LG

    Reflected Flow Matching

    Authors: Tianyu Xie, Yu Zhu, Longlin Yu, Tong Yang, Ziheng Cheng, Shiyue Zhang, Xiangyu Zhang, Cheng Zhang

    Abstract: Continuous normalizing flows (CNFs) learn an ordinary differential equation to transform prior samples into data. Flow matching (FM) has recently emerged as a simulation-free approach for training CNFs by regressing a velocity model towards the conditional velocity field. However, on constrained domains, the learned velocity model may lead to undesirable flows that result in highly unnatural sampl… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: ICML 2024 camera-ready

  22. arXiv:2405.16534  [pdf, other

    cs.CV

    Pruning for Robust Concept Erasing in Diffusion Models

    Authors: Tianyun Yang, Juan Cao, Chang Xu

    Abstract: Despite the impressive capabilities of generating images, text-to-image diffusion models are susceptible to producing undesirable outputs such as NSFW content and copyrighted artworks. To address this issue, recent studies have focused on fine-tuning model parameters to erase problematic concepts. However, existing methods exhibit a major flaw in robustness, as fine-tuned models often reproduce th… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Under review

  23. arXiv:2405.15193  [pdf, other

    cs.DB cs.DS

    CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs

    Authors: Zhuochen Fan, Yalun Cai, Zirui Liu, Jiarui Guo, Xin Fan, Tong Yang, Bin Cui

    Abstract: Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of gra… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  24. arXiv:2405.14280  [pdf, other

    cs.IR

    ASI++: Towards Distributionally Balanced End-to-End Generative Retrieval

    Authors: Yuxuan Liu, Tianchi Yang, Zihan Zhang, Minghui Song, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang

    Abstract: Generative retrieval, a promising new paradigm in information retrieval, employs a seq2seq model to encode document features into parameters and decode relevant document identifiers (IDs) based on search queries. Existing generative retrieval solutions typically rely on a preprocessing stage to pre-define document IDs, which can suffer from a semantic gap between these IDs and the retrieval task.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  25. arXiv:2405.13800  [pdf, other

    cs.CV cs.AI

    Dense Connector for MLLMs

    Authors: Huan** Yao, Wenhao Wu, Taojiannan Yang, YuXin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, **gdong Wang

    Abstract: Do we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the current MLLM rat race, the focus seems to be predominantly on the linguistic side. We witness the rise of larger and higher-quality instruction datasets, as well… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Technical report. 25 pages

  26. arXiv:2405.13560  [pdf, other

    cs.HC cs.AI

    Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain

    Authors: Yizhe Zhang, Yucheng **, Li Chen, Ting Yang

    Abstract: Conversational recommender systems (CRS) enable users to articulate their preferences and provide feedback through natural language. With the advent of large language models (LLMs), the potential to enhance user engagement with CRS and augment the recommendation process with LLM-generated content has received increasing attention. However, the efficacy of LLM-powered CRS is contingent upon the use… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  27. arXiv:2405.06948  [pdf, other

    cs.CV

    Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation

    Authors: Shengyuan Liu, Bo Wang, Ye Ma, Te Yang, Xipeng Cao, Quan Chen, Han Li, Di Dong, Peng Jiang

    Abstract: Existing subject-driven text-to-image generation models suffer from tedious fine-tuning steps and struggle to maintain both text-image alignment and subject fidelity. For generating compositional subjects, it often encounters problems such as object missing and attribute mixing, where some subjects in the input prompt are not generated or their attributes are incorrectly combined. To address these… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: 26 pages, 13 figures

  28. arXiv:2405.05945  [pdf, other

    cs.CV

    Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

    Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, **gwen He, Yu Qiao, Hongsheng Li

    Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

  29. arXiv:2405.03875  [pdf, other

    cs.LG stat.ML

    Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

    Authors: Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

    Abstract: Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  30. arXiv:2404.17801  [pdf

    cs.LG

    Dynamical Mode Recognition of Coupled Flame Oscillators by Supervised and Unsupervised Learning Approaches

    Authors: Weiming Xu, Tao Yang, Peng Zhang

    Abstract: Combustion instability in gas turbines and rocket engines, as one of the most challenging problems in combustion research, arises from the complex interactions among flames, which are also influenced by chemical reactions, heat and mass transfer, and acoustics. Identifying and understanding combustion instability is essential to ensure the safe and reliable operation of many combustion systems, wh… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: research paper (21 pages, 15 figures)

  31. arXiv:2404.16914  [pdf, other

    cs.LG cs.AI cs.CL

    Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

    Authors: Peizhuang Cong, Aomufei Yuan, Shimao Chen, Yuxuan Tian, Bowen Ye, Tong Yang

    Abstract: MoE facilitates the development of large models by making the computational complexity of the model no longer scale linearly with increasing parameters. The learning sparse gating network selects a set of experts for each token to be processed; however, this may lead to differences in the number of tokens processed by each expert over several successive iterations, i.e., the expert load fluctuatio… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  32. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  33. arXiv:2404.13061  [pdf, other

    cs.AR cs.AI cs.LG

    FPGA Divide-and-Conquer Placement using Deep Reinforcement Learning

    Authors: Shang Wang, Deepak Ranganatha Sastry Mamillapalli, Tianpei Yang, Matthew E. Taylor

    Abstract: This paper introduces the problem of learning to place logic blocks in Field-Programmable Gate Arrays (FPGAs) and a learning-based method. In contrast to previous search-based placement algorithms, we instead employ Reinforcement Learning (RL) with the goal of minimizing wirelength. In addition to our preliminary learning results, we also evaluated a novel decomposition to address the nature of la… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: accepted by ISEDA2024

  34. arXiv:2404.10413  [pdf, other

    cs.DB cs.LG cs.PF

    VDTuner: Automated Performance Tuning for Vector Data Management Systems

    Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu

    Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  35. arXiv:2404.08896  [pdf, other

    cs.IR

    Approximate Cluster-Based Sparse Document Retrieval with Segmented Maximum Term Weights

    Authors: Yifan Qiao, Shanxiu He, Yingrui Yang, Parker Carlson, Tao Yang

    Abstract: This paper revisits cluster-based retrieval that partitions the inverted index into multiple groups and skips the index partially at cluster and document levels during online inference using a learned sparse representation. It proposes an approximate search scheme with two parameters to control the rank-safeness competitiveness of pruning with segmented maximum term weights within each cluster. Cl… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  36. arXiv:2404.08201  [pdf, other

    eess.IV cs.CV

    A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images

    Authors: Yizhi Pan, Junyi Xin, Tianhua Yang, Teeradaj Racharak, Le-Minh Nguyen, Guanqun Sun

    Abstract: In medical imaging, accurate image segmentation is crucial for quantifying diseases, assessing prognosis, and evaluating treatment outcomes. However, existing methods lack an in-depth integration of global and local features, failing to pay special attention to abnormal regions and boundary details in medical images. To this end, we present a novel deep learning-based approach, MIPC-Net, for preci… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  37. arXiv:2404.07987  [pdf, other

    cs.CV cs.AI cs.LG

    ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

    Authors: Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

    Abstract: To enhance the controllability of text-to-image diffusion models, existing efforts like ControlNet incorporated image-based conditional controls. In this paper, we reveal that existing methods still face significant challenges in generating images that align with the image conditional controls. To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicit… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Project Page: https://liming-ai.github.io/ControlNet_Plus_Plus

  38. arXiv:2404.04575  [pdf, other

    cs.LG cs.AI math.OC

    To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

    Authors: Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang

    Abstract: The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 41 pages, 10 figures, accepted by ICML2024

  39. arXiv:2404.01359  [pdf

    quant-ph cs.AI cs.NE

    Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification

    Authors: Zuyu Xu, Kang Shen, Pengnian Cai, Tao Yang, Yuanming Hu, Shixian Chen, Yunlai Zhu, Zuheng Wu, Yuehua Dai, Jun Wang, Fei Yang

    Abstract: The recent emergence of the hybrid quantum-classical neural network (HQCNN) architecture has garnered considerable attention due to the potential advantages associated with integrating quantum principles to enhance various facets of machine learning algorithms and computations. However, the current investigated serial structure of HQCNN, wherein information sequentially passes from one network to… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  40. arXiv:2403.19930  [pdf, other

    cs.CL

    Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching

    Authors: Shulin Liu, Chengcheng Xu, Hao Liu, Tinghao Yu, Tao Yang

    Abstract: The recent success of Large Language Models (LLMs) has garnered significant attention in both academia and industry. Prior research on LLMs has primarily focused on enhancing or leveraging their generalization capabilities in zero- and few-shot settings. However, there has been limited investigation into effectively fine-tuning LLMs for a specific natural language understanding task in supervised… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  41. arXiv:2403.17574  [pdf, other

    cs.SE cs.DC

    SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions

    Authors: Cheryl Lee, Zhouruixin Zhu, Tianyi Yang, Yintong Huo, Yuxin Su, Pinjia He, Michael R. Lyu

    Abstract: As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloadi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)

  42. arXiv:2403.17567  [pdf, other

    cs.PL

    Piecewise Linear Expectation Analysis via $k$-Induction for Probabilistic Programs

    Authors: Tengshun Yang, Hongfei Fu, **gyu Ke, Naijun Zhan, Shiyang Wu

    Abstract: Quantitative analysis of probabilistic programs aims at deriving tight numerical bounds for probabilistic properties such as expectation and assertion probability, and plays a crucial role in the verification of probabilistic programs. Along this line of research, most existing works consider numerical bounds over the whole state space monolithically and do not consider piecewise bounds. Clearly,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  43. arXiv:2403.17431  [pdf, other

    cs.CL cs.LG

    Robust and Scalable Model Editing for Large Language Models

    Authors: Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun

    Abstract: Large language models (LLMs) can make predictions using parametric knowledge--knowledge encoded in the model weights--or contextual knowledge--knowledge presented in the context. In many scenarios, a desirable behavior is that LLMs give precedence to contextual knowledge when it conflicts with the parametric knowledge, and fall back to using their parametric knowledge when the context is irrelevan… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024 paper, 16 pages, 4 figures

  44. arXiv:2403.16209  [pdf

    cs.CV cs.AI

    Image Captioning in news report scenario

    Authors: Tianrui Liu, Qi Cai, Changxin Xu, Bo Hong, Jize Xiong, Yuxin Qiao, Tsungwei Yang

    Abstract: Image captioning strives to generate pertinent captions for specified images, situating itself at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This endeavor is of paramount importance with far-reaching applications in recommendation systems, news outlets, social media, and beyond. Particularly within the realm of news reporting, captions are expected to encompass d… ▽ More

    Submitted 1 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures

  45. arXiv:2403.16206  [pdf

    cs.AI

    Rumor Detection with a novel graph neural network approach

    Authors: Tianrui Liu, Qi Cai, Changxin Xu, Bo Hong, Fanghao Ni, Yuxin Qiao, Tsungwei Yang

    Abstract: The wide spread of rumors on social media has caused a negative impact on people's daily life, leading to potential panic, fear, and mental health problems for the public. How to debunk rumors as early as possible remains a challenging problem. Existing studies mainly leverage information propagation structure to detect rumors, while very few works focus on correlation among users that they may co… ▽ More

    Submitted 1 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures

  46. arXiv:2403.13524  [pdf, other

    cs.CV cs.AI

    Compress3D: a Compressed Latent Space for 3D Generation from a Single Image

    Authors: Bowen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao

    Abstract: 3D generation has witnessed significant advancements, yet efficiently producing high-quality 3D assets from a single image remains challenging. In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information. Within the autoencoder framework, we introduce a 3D-aware cross-attention m… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  47. arXiv:2403.13064  [pdf, other

    cs.CV

    SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

    Authors: Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas

    Abstract: We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: see project page, https://projectaria.com/scenescript

  48. arXiv:2403.11459  [pdf, other

    cs.RO

    ALDM-Gras**: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Gras**

    Authors: Yiwei Li, Zihao Wu, Huaqin Zhao, Tianze Yang, Zhengliang Liu, Peng Shu, ** Sun, Ramviyas Parasuraman, Tianming Liu

    Abstract: To tackle the "reality gap" encountered in Sim-to-Real transfer, this study proposes a diffusion-based framework that minimizes inconsistencies in gras** actions between the simulation settings and realistic environments. The process begins by training an adversarial supervision layout-to-image diffusion model(ALDM). Then, leverage the ALDM approach to enhance the simulation environment, renderi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  49. arXiv:2403.10983  [pdf, other

    cs.CV

    OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

    Authors: Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, Wenhan Luo

    Abstract: Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts wit… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Homepage: https://kongzhecn.github.io/omg-project/ Github: https://github.com/kongzhecn/OMG/

  50. arXiv:2403.10037  [pdf, other

    cs.CV

    Knowledge Condensation and Reasoning for Knowledge-based VQA

    Authors: Dongze Hao, Jian Jia, Longteng Guo, Qunbo Wang, Te Yang, Yan Li, Yanhua Cheng, Bo Wang, Quan Chen, Han Li, **g Liu

    Abstract: Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions. However, these retrieved knowledge passages often contain irrelevant or noisy inform… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.