Skip to main content

Showing 1–50 of 2,982 results for author: Huang, H

.
  1. arXiv:2407.00935  [pdf, other

    cs.LG cs.CL

    Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining

    Authors: Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang

    Abstract: In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical co… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2407.00560  [pdf, other

    q-bio.BM math.OC

    DCI: An Accurate Quality Assessment Criteria for Protein Complex Structure Models

    Authors: Wenda Wang, Jiaqi Zhai, He Huang, Xinqi Gong

    Abstract: The structure of proteins is the basis for studying protein function and drug design. The emergence of AlphaFold 2 has greatly promoted the prediction of protein 3D structures, and it is of great significance to give an overall and accurate evaluation of the predicted models, especially the complex models. Among the existing methods for evaluating multimer structures, DockQ is the most commonly us… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2407.00125  [pdf, other

    cs.SE cs.AI cs.DC

    A Survey on Failure Analysis and Fault Injection in AI Systems

    Authors: Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

    Abstract: The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ens… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  4. arXiv:2406.19954  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

    Authors: Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Żelasko, Jagadeesh Balam, Boris Ginsburg

    Abstract: Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a sequence of LLM inputs like a decoder-only model; ii) T5-style, introduce speech cross-attention to each layer of the pretrained LLMs. We propose BESTO… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    MSC Class: 68T10 ACM Class: I.2.7

  5. arXiv:2406.19674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

    Authors: Krishna C. Puvvada, Piotr Żelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while b… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech-2024

  6. arXiv:2406.19212  [pdf, other

    quant-ph

    JuliVQC: an Efficient Variational Quantum Circuit Simulator for Near-Term Quantum Algorithms

    Authors: Wei-You Liao, Xiang Wang, Xiao-Yue Xu, Chen Ding, Shuo Zhang, He-Liang Huang, Chu Guo

    Abstract: We introduce JuliVQC: a light-weight, yet extremely efficient variational quantum circuit simulator. JuliVQC is part of an effort for classical simulation of the \textit{Zuchongzhi} quantum processors, where it is extensively used to characterize the circuit noises, as a building block in the Schr$\ddot{\text{o}}$dinger-Feynman algorithm for classical verification and performance benchmarking, and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  7. arXiv:2406.18871  [pdf, other

    eess.AS cs.CL

    DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  8. arXiv:2406.18254  [pdf, other

    cs.IR cs.AI cs.MM

    Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

    Authors: Zhijie Nie, Richong Zhang, Zhangchi Feng, Hailang Huang, Xudong Liu

    Abstract: Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on l… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024 Research Track

  9. arXiv:2406.18146  [pdf, other

    cs.CV

    A Refer-and-Ground Multimodal Large Language Model for Biomedicine

    Authors: Xiaoshuang Huang, Haifeng Huang, Lingdong Shen, Yehui Yang, Fangxin Shang, Junwei Liu, Jia Liu

    Abstract: With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhibits a substantial gap in this area, primarily due to the absence of a dedicated refer and ground dataset for biomedical images. To address this chall… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI2024

  10. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  11. arXiv:2406.17770  [pdf, other

    cs.CV

    MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

    Authors: Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang

    Abstract: Multi-modal large language models (MLLMs) have made significant strides in various visual understanding tasks. However, the majority of these models are constrained to process low-resolution images, which limits their effectiveness in perception tasks that necessitate detailed visual information. In our study, we present MG-LLaVA, an innovative MLLM that enhances the model's visual processing capa… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  12. arXiv:2406.17565  [pdf, other

    cs.DC

    MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool

    Authors: Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan

    Abstract: Large language model (LLM) serving has transformed from stateless to stateful systems, utilizing techniques like context caching and disaggregated inference. These optimizations extend the lifespan and domain of the KV cache, necessitating a new architectural approach. We present MemServe, a unified system that integrates both inter-request and intra-request optimizations. MemServe introduces MemP… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.16520  [pdf

    cond-mat.str-el cond-mat.mtrl-sci cond-mat.supr-con

    Gigantic-oxidative atomically layered epitaxy for designed complex oxides

    Authors: Guangdi Zhou, Haoliang Huang, Fengzhe Wang, Heng Wang, Qishuo Yang, Zihao Nie, Wei Lv, Cui Ding, Yueying Li, Danfeng Li, Yujie Sun, Junhao Lin, Guang-Ming Zhang, Qi-Kun Xue, Zhuoyu Chen

    Abstract: In designing material functionality within the intricate realm of transition metal oxides, lattice structure and d-orbital occupancy are two principal determinants of the correlated physical properties, such as superconductivity. However, the modulation of these two factors is inherently limited by the need to balance thermodynamic stability, kinetic mobility, and synthesis precision, particularly… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16137  [pdf, other

    cs.CV

    MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

    Authors: Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

    Abstract: Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  16. arXiv:2406.15677  [pdf, other

    cs.RO

    Open-vocabulary Pick and Place via Patch-level Semantic Maps

    Authors: Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex

    Abstract: Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  17. arXiv:2406.14955  [pdf, other

    cs.CL

    ICLEval: Evaluating In-Context Learning Ability of Large Language Models

    Authors: Wentong Chen, Yankai Lin, ZhenHao Zhou, HongYun Huang, Yantao Jia, Zhao Cao, Ji-Rong Wen

    Abstract: In-Context Learning (ICL) is a critical capability of Large Language Models (LLMs) as it empowers them to comprehend and reason across interconnected inputs. Evaluating the ICL ability of LLMs can enhance their utilization and deepen our understanding of how this ability is acquired at the training stage. However, existing evaluation frameworks primarily focus on language abilities and knowledge,… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  18. arXiv:2406.14909  [pdf, other

    cs.LG cs.AI cs.CL

    MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Authors: Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring thei… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 10 pages

    ACM Class: I.2.7

  19. arXiv:2406.14828  [pdf, other

    cs.CL

    Word Matters: What Influences Domain Adaptation in Summarization?

    Authors: Yinghao Li, Siyu Miao, Heyan Huang, Yang Gao

    Abstract: Domain adaptation aims to enable Large Language Models (LLMs) to generalize domain datasets unseen effectively during the training phase. However, factors such as the size of the model parameters and the scale of training data are general influencers and do not reflect the nuances of domain adaptation performance. This paper investigates the fine-grained factors affecting domain adaptation perform… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.14307  [pdf, other

    q-bio.GN cs.CL cs.CV

    QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis

    Authors: Chao Hui Huang

    Abstract: In this paper, we introduce QuST-LLM, an innovative extension of QuPath that utilizes the capabilities of large language models (LLMs) to analyze and interpret spatial transcriptomics (ST) data. This tool effectively simplifies the intricate and high-dimensional nature of ST data by offering a comprehensive workflow that includes data loading, region selection, gene expression analysis, and functi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  21. arXiv:2406.14075  [pdf, other

    cs.CL

    EXCEEDS: Extracting Complex Events as Connecting the Dots to Graphs in Scientific Domain

    Authors: Yi-Fan Lu, Xian-Ling Mao, Bo Wang, Xiao Liu, Heyan Huang

    Abstract: It is crucial to utilize events to understand a specific domain. There are lots of research on event extraction in many domains such as news, finance and biology domain. However, scientific domain still lacks event extraction research, including comprehensive datasets and corresponding methods. Compared to other domains, scientific domain presents two characteristics: denser nuggets and more compl… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: This paper is working in process

  22. arXiv:2406.14025  [pdf

    cond-mat.mtrl-sci

    Direct Observation of Dendrites Nucleation in Li Metal Battery by Machine Learning Accelerated Molecular Simulations under Realistic Electrochemical Conditions

    Authors: Tai** Hu, Haichao Huang, Guobing Zhou, Xinyan Wang, Zheng Cheng, Fangjia Fu, Xiaoxu Wang, Fuzhi Dai, Kuang Yu, Shenzhen Xu

    Abstract: Uncontrollable dendrites growth during electrochemical cycles leads to low Coulombic efficiency and critical safety issues in Li metal batteries. Hence, a comprehensive understanding of the dendrite formation mechanism is essential for further enhancing the performance of Li metal batteries. Machine learning accelerated molecular dynamics (MD) simulations can provide atomic-scale resolution for va… ▽ More

    Submitted 28 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.13978  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci physics.comp-ph quant-ph

    Topological Solitons in Square-root Graphene Nanoribbons Controlled by Electric Fields

    Authors: Haiyue Huang, Mamun Sarker, Percy Zahl, C. Stephen Hellberg, Jeremy Levy, Ioannis Petrides, Alexander Sinitskii, Prineha Narang

    Abstract: Graphene nanoribbons (GNRs) are unique quasi-one-dimensional (1D) materials that have garnered a lot of research interest in the field of topological insulators. While the topological phases exhibited by GNRs are primarily governed by their chemical structures, the ability to externally control these phases is crucial for their potential utilization in quantum electronics and spintronics. Here we… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  24. arXiv:2406.13455  [pdf, ps, other

    math.CO math.RT

    Johnson graphs as slices of a hypercube and an algebra homomorphism from the universal Racah algebra into $U(\mathfrak{sl}_2)$

    Authors: Hau-Wen Huang, Chia-Yi Wen

    Abstract: From the viewpoint of Johnson graphs as slices of a hypercube, we derive a novel algebra homomorphism $\sharp$ from the universal Racah algebra $\Re$ into $U(\mathfrak{sl}_2)$. We use the Casimir elements of $\Re$ to describe the kernel of $\sharp$. By pulling back via $\sharp$ every $U(\mathfrak{sl}_2)$-module can be viewed as an $\Re$-module. We show that for any finite-dimensional… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 05E30; 16G30; 16S30; 33D45

  25. arXiv:2406.13324  [pdf, other

    cond-mat.mes-hall

    Quantum Metric-induced Oscillations in Flat Bands

    Authors: Hui Zeng, Zijian Zhou, Wenhui Duan, Huaqing Huang

    Abstract: The transport of Bloch electrons under strong fields is traditionally understood through two mechanisms: intraband Bloch oscillations and interband Zener tunneling. Here we propose a new oscillation mechanism induced by the interband quantum metric, which would significantly affect the electron dynamics under strong fields. By considering the multiband dynamics to the second order of the density m… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.13167  [pdf, other

    cs.CL

    QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism

    Authors: Bo Wang, Heyan Huang, Yixin Cao, Jiahao Ying, Wei Tang, Chong Feng

    Abstract: While large language models (LLMs) have made notable advancements in natural language processing, they continue to struggle with processing extensive text. Memory mechanism offers a flexible solution for managing long contexts, utilizing techniques such as compression, summarization, and structuring to facilitate nuanced and efficient handling of large volumes of text. However, existing techniques… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  27. arXiv:2406.12847  [pdf, other

    cs.CV

    ChangeViT: Unleashing Plain Vision Transformers for Change Detection

    Authors: Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

    Abstract: Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  28. arXiv:2406.12640  [pdf

    cs.LG

    Research and Implementation of Data Enhancement Techniques for Graph Neural Networks

    Authors: **gzhao Gu, Haoyang Huang

    Abstract: Data, algorithms, and arithmetic power are the three foundational conditions for deep learning to be effective in the application domain. Data is the focus for develo** deep learning algorithms. In practical engineering applications, some data are affected by the conditions under which more data cannot be obtained or the cost of obtaining data is too high, resulting in smaller data sets (general… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 7 figures, to be published in IEEE International Conference on Artificial Intelligence and Electromechanical Automation

  29. arXiv:2406.12437  [pdf, other

    math.ST math.PR

    Slow rates of approximation of U-statistics and V-statistics by quadratic forms of Gaussians

    Authors: Kevin Han Huang, Peter Orbanz

    Abstract: We construct examples of degree-two U- and V-statistics of $n$ i.i.d.~heavy-tailed random vectors in $\mathbb{R}^{d(n)}$, whose $ν$-th moments exist for ${ν> 2}$, and provide tight bounds on the error of approximating both statistics by a quadratic form of Gaussians. In the case ${ν=3}$, the error of approximation is $Θ(n^{-1/12})$. The proof adapts a result of Huang, Austern and Orbanz [12] to U-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  30. arXiv:2406.12380  [pdf, other

    hep-ex physics.ins-det

    Search for fractionally charged particles with CUORE

    Authors: CUORE Collaboration, D. Q. Adams, C. Alduino, K. Alfonso, F. T. Avignone III, O. Azzolini, G. Bari, F. Bellini, G. Benato, M. Beretta, M. Biassoni, A. Branca, C. Brofferio, C. Bucci, J. Camilleri, A. Caminata, A. Campani, J. Cao, S. Capelli, C. Capelli, L. Cappelli, L. Cardani, P. Carniti, N. Casali, E. Celi , et al. (95 additional authors not shown)

    Abstract: The Cryogenic Underground Observatory for Rare Events (CUORE) is a detector array comprised by 988 5$\;$cm$\times$5$\;$cm$\times$5$\;$cm TeO$_2$ crystals held below 20 mK, primarily searching for neutrinoless double-beta decay in $^{130}$Te. Unprecedented in size amongst cryogenic calorimetric experiments, CUORE provides a promising setting for the study of exotic through-going particles. Using th… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures

  31. arXiv:2406.12100  [pdf, other

    cs.LG cs.RO

    Adaptive Uncertainty Quantification for Trajectory Prediction Under Distributional Shift

    Authors: Huiqun Huang, Sihong He, Fei Miao

    Abstract: Trajectory prediction models that can infer both finite future trajectories and their associated uncertainties of the target vehicles in an online setting (e.g., real-world application scenarios) is crucial for ensuring the safe and robust navigation and path planning of autonomous vehicle motion. However, the majority of existing trajectory prediction models have neither considered reducing the u… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures

  32. arXiv:2406.12042  [pdf, other

    cs.CV cs.LG

    Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models

    Authors: Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao, Heng Huang

    Abstract: Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pru… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.12018  [pdf, other

    cs.CL

    CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

    Authors: Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau, Yang Gao, Jackie Chi Kit Cheung

    Abstract: Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexit… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Work in progress

  34. arXiv:2406.11740  [pdf, other

    cs.RO cs.AI cs.LG

    Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies

    Authors: Haojie Huang, Karl Schmeckpeper, Dian Wang, Ondrej Biza, Yaoyao Qian, Haotian Liu, Mingxi Jia, Robert Platt, Robin Walters

    Abstract: Humans can imagine goal states during planning and perform actions to match those goals. In this work, we propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  35. arXiv:2406.11474  [pdf, other

    cs.CL cs.AI

    How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignment

    Authors: Heyan Huang, Yinghao Li, Huashan Sun, Yu Bai, Yang Gao

    Abstract: Recent studies have demonstrated that In-Context Learning (ICL), through the use of specific demonstrations, can align Large Language Models (LLMs) with human preferences known as In-Context Alignment (ICA), indicating that models can comprehend human instructions without requiring parameter adjustments. However, the exploration of the mechanism and applicability of ICA remains limited. In this pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 6 figures, work in progress

  36. arXiv:2406.10740  [pdf, other

    cs.CV

    FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models

    Authors: Zhikai Zhang, Yitang Li, Haofeng Huang, Mingxian Lin, Li Yi

    Abstract: Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same… ▽ More

    Submitted 21 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  37. arXiv:2406.10499  [pdf, other

    stat.ME stat.AP

    Functional Clustering for Longitudinal Associations between Social Determinants of Health and Stroke Mortality in the US

    Authors: Fangzhi Luo, Jianbin Tan, Donglan Zhang, Hui Huang, Ye Shen

    Abstract: Understanding longitudinally changing associations between Social determinants of health (SDOH) and stroke mortality is crucial for timely stroke management. Previous studies have revealed a significant regional disparity in the SDOH -- stroke mortality associations. However, they do not develop data-driven methods based on these longitudinal associations for regional division in stroke control. T… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  38. arXiv:2406.10328  [pdf, other

    cs.CV cs.CL cs.LG

    From Pixels to Prose: A Large Dataset of Dense Image Captions

    Authors: Vasu Singla, Kaiyu Yue, Sukriti Paul, Reza Shirkavand, Mayuka Jayawardhana, Alireza Ganjdanesh, Heng Huang, Abhinav Bhatele, Gowthami Somepalli, Tom Goldstein

    Abstract: Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models for detailed and accurate descriptions. To ensure d… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: pixelprose 16M dataset

  39. arXiv:2406.09611  [pdf, other

    cs.HC

    Recy-ctronics: Designing Fully Recyclable Electronics With Varied Form Factors

    Authors: Tingyu Cheng, Zhihan Zhang, Han Huang, Yingting Gao, Wei Sun, Gregory D. Abowd, HyunJoo Oh, Josiah Hester

    Abstract: For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  40. arXiv:2406.09401  [pdf, other

    cs.CV cs.AI cs.RO

    MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

    Authors: Ruiyuan Lyu, Tai Wang, **gli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

    Abstract: With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous works mainly focus on understanding object properties or inter-object spatial relationships in a 3D scene. To tackle this problem, this paper builds the… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Follow-up of EmbodiedScan. A multi-modal 3D dataset with the most-ever comprehensive language annotations for 3D-LLMs. Project page: https://tai-wang.github.io/mmscan/

  41. arXiv:2406.08772  [pdf, other

    cs.CV cs.CL

    MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

    Authors: Xuannan Liu, Zekun Li, Peipei Li, Shuhan Xia, Xing Cui, Linzhi Huang, Huaibo Huang, Weihong Deng, Zhaofeng He

    Abstract: Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MM… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  42. arXiv:2406.08759  [pdf, other

    cs.CV cs.MM

    Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling

    Authors: Fengyi Zhang, Tianjun Zhang, Lin Zhang, Helen Huang, Yadan Luo

    Abstract: The field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage chal… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  43. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  44. arXiv:2406.08488  [pdf, other

    cs.CV cs.AI cs.LG

    ICE-G: Image Conditional Editing of 3D Gaussian Splats

    Authors: Vishnu Jaganathan, Hannah Hanyun Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira

    Abstract: Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically cor… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR AI4CC Workshop 2024. Project page: https://ice-gaussian.github.io

  45. arXiv:2406.07657  [pdf, other

    cs.LG cs.CL

    OPTune: Efficient Online Preference Tuning

    Authors: Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang

    Abstract: Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent works have shown that the online variants achieve even better alignment. However, online alignment requires on-the-fly generation of new training data, which is… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 16 pages, 7 figures

  46. arXiv:2406.07306  [pdf, other

    physics.med-ph

    A directional total variation minimization algorithm for isotropic resolution in digital breast tomosynthesis

    Authors: Emil Y. Sidky, Xiangyi Wu, Xiaoyu Duan, Hailing Huang, Wei Zhao, Leo Y. Zhang, John Paul Phillips, Zheng Zhang, Buxin Chen, Dan Xia, Ingrid S. Reiser, Xiaochuan Pan

    Abstract: An optimization-based image reconstruction algorithm is developed for contrast enhanced digital breast tomosynthesis (DBT) using dual-energy scanning. The algorithm minimizes directional total variation (TV) with a data discrepancy and non-negativity constraints. Iodinated contrast agent (ICA) imaging is performed by reconstructing images from dual-energy DBT data followed by weighted subtraction.… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Proceedings paper for accepted contribution to the 8th International Conference on Image Formation in X-Ray Computed Tomography (https://www.ct-meeting.org)

  47. arXiv:2406.05516  [pdf, other

    cs.LG cs.AI cs.CL

    Verbalized Probabilistic Graphical Modeling with Large Language Models

    Authors: Hengguan Huang, Xing Shen, Songtao Wang, Dianbo Liu, Hao Wang

    Abstract: Faced with complex problems, the human brain demonstrates a remarkable capacity to transcend sensory input and form latent understandings of perceived world patterns. However, this cognitive capacity is not explicitly considered or encoded in current large language models (LLMs). As a result, LLMs often struggle to capture latent structures and model uncertainty in complex compositional reasoning… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  48. arXiv:2406.05261  [pdf, other

    cs.CV cs.GR

    Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning

    Authors: Yilin Liu, Jiale Chen, Shanshan Pan, Daniel Cohen-Or, Hao Zhang, Hui Huang

    Abstract: We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models which involves a two-step process: it first applies a spatial partitioning, referred to as the ``split``, followed by a ``fit`` operation to derive a single primitive within each partition. Specifically, our partitioning aims to produce the classical Voronoi diagram of the set of ground-truth (GT) B-Rep pr… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ACM Transactions on Graphics (SIGGRAPH 2024); Project page: https://vcc.tech/research/2024/BRepVP; Code: https://github.com/yilinliu77/NVDNet

  49. arXiv:2406.04998  [pdf, other

    cs.LG cs.AI cs.CV

    ADBA:Approximation Decision Boundary Approach for Black-Box Adversarial Attacks

    Authors: Feiyang Wang, Xingquan Zuo, Hai Huang, Gang Chen

    Abstract: Many machine learning models are susceptible to adversarial attacks, with decision-based black-box attacks representing the most critical threat in real-world applications. These attacks are extremely stealthy, generating adversarial examples using hard labels obtained from the target machine learning model. This is typically realized by optimizing perturbation directions, guided by decision bound… ▽ More

    Submitted 12 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures, conference

  50. arXiv:2406.04690  [pdf, other

    cs.LG stat.ML

    Higher-order Structure Based Anomaly Detection on Attributed Networks

    Authors: Xu Yuan, Na Zhou, Shuo Yu, Huafei Huang, Zhikui Chen, Feng Xia

    Abstract: Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.