Skip to main content

Showing 1–50 of 2,219 results for author: Lin, Z

.
  1. arXiv:2407.02483  [pdf, other

    cs.CL cs.AI

    MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

    Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

    Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To brid… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01980  [pdf, other

    astro-ph.SR astro-ph.GA

    Search for Classical Cepheids in Galactic Open Clusters and Calibration of the Period Wesenheit Metallicity Relation in the Gaia Bands

    Authors: Huajian Wang, Ye Xu, Zehao Lin, Chaojie Hao, Dejian Liu, Yingjie Li

    Abstract: It is beneficial to calibrate the period Wesenheit metallicity relation (PWZR) of Delta Cephei stars (DCEPs), i.e., classical Cepheids, using accurate parallaxes of associated open clusters (OCs) from Gaia data release 3 (DR3). To this aim, we obtain a total of 43 OC-DCEPs (including 33 fundamental mode, 9 first overtone mode, and 1 multimode DCEPs.) and calibrate the PWZR as… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Journal ref: 2024 AJ 168 34

  3. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wen** Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, **bo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equip** LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  4. arXiv:2407.00952  [pdf, other

    cs.LG cs.CL cs.DC

    SplitLoRA: A Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

    Authors: Zheng Lin, Xuanjie Hu, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Ang Li, Praneeth Vepakomma, Yue Gao

    Abstract: The scalability of large language models (LLMs) in handling high-complexity models and large-scale datasets has led to tremendous successes in pivotal domains. While there is an urgent need to acquire more training data for LLMs, a concerning reality is the depletion of high-quality public datasets within a few years. In view of this, the federated learning (FL) LLM fine-tuning paradigm recently h… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  5. arXiv:2407.00945  [pdf, other

    cs.LG

    Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

    Authors: Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: The rapid advancement of large language models (LLMs) has led to architectures with billions to trillions of parameters, posing significant deployment challenges due to their substantial demands on memory, processing power, and energy consumption. Sparse Mixture-of-Experts (SMoE) architectures have emerged as a solution, activating only a subset of parameters per token, thereby achieving faster in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2407.00264  [pdf, other

    cs.AI cs.LG

    External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling

    Authors: Rishav Bhagat, Jonathan Balloch, Zhiyu Lin, Julia Kim, Mark Riedl

    Abstract: Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments. In spite of only experiencing the world through their own observations and interactions, people know how to balance focusing on tasks with learning about how changes may affect their understanding of the world. This is possible by choosing to solve tasks in ways that are interesting and generall… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  7. arXiv:2407.00187  [pdf, other

    cs.RO cs.CV cs.GR

    SMPLOlympics: Sports Environments for Physically Simulated Humanoids

    Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, **gbo Wang, Ye Yuan, **kun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

    Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Project page: https://smplolympics.github.io/SMPLOlympics

  8. arXiv:2407.00086  [pdf, other

    physics.comp-ph nlin.PS nlin.SI

    Pseudo grid-based physics-informed convolutional-recurrent network solving the integrable nonlinear lattice equations

    Authors: Zhe Lin, Yong Chen

    Abstract: Traditional discrete learning methods involve discretizing continuous equations using difference schemes, necessitating considerations of stability and convergence. Integrable nonlinear lattice equations possess a profound mathematical structure that enables them to revert to continuous integrable equations in the continuous limit, particularly retaining integrable properties such as conservation… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  9. arXiv:2406.20015  [pdf, other

    cs.CL cs.AI

    ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

    Authors: Yuxiang Zhang, **g Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen Wan, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana

    Abstract: Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. Due to the lack of benchmarks, the community still needs to fully understand the hallucination issues within these models. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. Specifically, we assess the LLM's hallucinations through two perspectives: depth and bre… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  10. arXiv:2406.19473  [pdf, other

    math.CA

    Restricted projections and Fourier decoupling in $\mathbb{Q}_p^n$

    Authors: Ben Johnsrude, Zuo Lin

    Abstract: We prove a restricted projection theorem for Borel subsets of $\mathbb{Q}_p^n$ in the regime $p>n$. This generalizes results of Gan-Guo-Wang in the real setting. Our result is effective in the sense that explicit constants are obtained for various covering numbers. Along the way, we prove a fully explicit Fourier decoupling theorem for the moment curve in $p$-adic Cartesian space.

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 50 pages, 2 figures

    MSC Class: 43A25 (Primary); 28A80 (Secondary)

  11. arXiv:2406.16007  [pdf, other

    cs.CL

    Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

    Authors: Bowen Zheng, Ming Ma, Zhongqiao Lin, Tianming Yang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL). With ICL, LLMs can derive the underlying rule from a few demonstrations and provide answers that comply with the rule. Previous work hypothesized that the network creates a "task vector" in specific positions during ICL. Patching the "task vector" allows LLMs to achieve z… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  12. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, **gyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: under peer review

  13. arXiv:2406.14973  [pdf, other

    cs.CV eess.IV

    LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement

    Authors: Haodong Yang, Jisheng Xu, Zhiliang Lin, Jian** He

    Abstract: Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effe… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  14. arXiv:2406.14643  [pdf, other

    cs.CV cs.AI cs.CL

    Holistic Evaluation for Interleaved Text-and-Image Generation

    Authors: Minqian Liu, Zhiyang Xu, Zihao Lin, Trevor Ashby, Joy Rimchala, Jiaxin Zhang, Lifu Huang

    Abstract: Interleaved text-and-image generation has been an intriguing research direction, where the models are required to generate both images and text pieces in an arbitrary order. Despite the emerging advancements in interleaved generation, the progress in its evaluation still significantly lags behind. Existing evaluation benchmarks do not support arbitrarily interleaved images and text for both inputs… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress. 13 pages, 5 figure, 6 tables

  15. arXiv:2406.14629  [pdf, other

    cs.CL cs.AI

    Can LLMs Learn by Teaching? A Preliminary Study

    Authors: Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in LLMs. However, for humans, teaching not only improves students but also improves teachers. We ask: Can LLMs also learn by teaching (LbT)? If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  16. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruo**g Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zai** Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  17. arXiv:2406.14100  [pdf, other

    q-bio.NC

    Self-Attention in Transformer Networks Explains Monkeys' Gaze Pattern in Pac-Man Game

    Authors: Zhongqiao Lin, Yunwei Li, Tianming Yang

    Abstract: We proactively direct our eyes and attention to collect information during problem solving and decision making. Understanding gaze patterns is crucial for gaining insights into the computation underlying the problem-solving process. However, there is a lack of interpretable models that can account for how the brain directs the eyes to collect information and utilize it, especially in the context o… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.13743  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

    Authors: Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia, Pengchuan Zhang, Graham Neubig, Deva Ramanan

    Abstract: While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-vis… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: We open-source our dataset, model, and code at: https://linzhiqiu.github.io/papers/genai_bench ; Project page: https://linzhiqiu.github.io/papers/genai_bench ; GenAI-Bench was first introduced in arxiv:2404.01291. This article extends it with an additional GenAI-Rank benchmark.

  19. arXiv:2406.13578  [pdf, other

    cs.CL

    Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration

    Authors: Han-Cheng Yu, Yu-An Shih, Kin-Man Law, Kai-Yu Hsieh, Yu-Chen Cheng, Hsin-Chih Ho, Zih-An Lin, Wen-Chuan Hsu, Yao-Chung Fan

    Abstract: In this paper, we tackle the task of distractor generation (DG) for multiple-choice questions. Our study introduces two key designs. First, we propose \textit{retrieval augmented pretraining}, which involves refining the language model pretraining to align it more closely with the downstream task of DG. Second, we explore the integration of knowledge graphs to enhance the performance of DG. Throug… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Findings at ACL 2024

  20. arXiv:2406.13444  [pdf, other

    cs.CL cs.CV

    VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

    Authors: Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

    Abstract: Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debug… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: update reference

  21. arXiv:2406.12053  [pdf, other

    cs.CL

    InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

    Authors: Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming **, Chang-Tien Lu, Lifu Huang

    Abstract: Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention st… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  22. arXiv:2406.11937  [pdf, other

    physics.ins-det hep-ex physics.data-an

    Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter

    Authors: M. Aamir, B. Acar, G. Adamov, T. Adams, C. Adloff, S. Afanasiev, C. Agrawal, C. Agrawal, A. Ahmad, H. A. Ahmed, S. Akbar, N. Akchurin, B. Akgul, B. Akgun, R. O. Akpinar, E. Aktas, A. AlKadhim, V. Alexakhin, J. Alimena, J. Alison, A. Alpana, W. Alshehri, P. Alvarez Dominguez, M. Alyari, C. Amendola , et al. (550 additional authors not shown)

    Abstract: A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr… ▽ More

    Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Prepared for submission to JINST

  23. arXiv:2406.11644  [pdf, other

    astro-ph.EP astro-ph.SR

    Detecting Planetary Oblateness in the Era of JWST: A Case Study of Kepler-167e

    Authors: Quanyi Liu, Wei Zhu, Yifan Zhou, Zhecheng Hu, Zitao Lin, Fei Dai, Kento Masuda, Sharon X. Wang

    Abstract: Planets may be rotationally flattened, and their oblateness thus provide useful information on their formation and evolution. Here we develop a new algorithm that can compute the transit light curve due to an oblate planet very efficiently and use it to study the detectability of planet oblateness (and spin obliquity) with the James Webb Space Telescope (JWST). Using the Jupiter analog, Kepler-167… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures. Submitted to Astronomical Journal

  24. arXiv:2406.11441  [pdf, other

    cs.CV

    SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

    Authors: Zhenchao Lin, Li He, Hongqiang Yang, Xiaoqun Sun, Cuo** Zhang, Weinan Chen, Yisheng Guan, Hong Zhang

    Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11249  [pdf, other

    cs.AI cs.LG

    Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

    Authors: Yang Chen, Cong Fang, Zhouchen Lin, Bing Liu

    Abstract: Foundation Models (FMs) have demonstrated remarkable insights into the relational dynamics of the world, leading to the crucial question: how do these models acquire an understanding of world hybrid relations? Traditional statistical learning, particularly for prediction problems, may overlook the rich and inherently structured information from the data, especially regarding the relationships betw… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  26. arXiv:2406.10946  [pdf, other

    hep-ph

    Extracting $α_\mathrm{S}$ at future $e^+e^{-}$ Higgs factory with energy correlators

    Authors: Zhen Lin, Manqi Ruan, Meng Xiao, Zhen Xu

    Abstract: The prospected sensitivity in $α_\mathrm{S}$ determination using an event shape observable, ratio of energy correlators at future electron-positron collider is presented. The study focuses on the collinear region which has suffered from large theoretical and hadronization uncertainty in the past. The ratio effectively reduces the impacts of the uncertainties. With the amount of data that future el… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 figures

  27. arXiv:2406.10856  [pdf, other

    cs.NI eess.SY

    LEO Satellite Networks Assisted Geo-distributed Data Processing

    Authors: Zhiyuan Zhao, Zhe Chen, Zheng Lin, Wenjun Zhu, Kun Qiu, Chaoqun You, Yue Gao

    Abstract: Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  28. arXiv:2406.10746  [pdf, other

    cs.CL cs.IR

    SparseCL: Sparse Contrastive Learning for Contradiction Retrieval

    Authors: Haike Xu, Zongyu Lin, Yizhou Sun, Kai-Wei Chang, Piotr Indyk

    Abstract: Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and crossencoder models exhibit significant limitations.… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  29. arXiv:2406.10285  [pdf, other

    cs.CR cs.AI

    I Don't Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors

    Authors: Zi** Lin, Yue Zhao, Kai Chen, **wen He

    Abstract: Deep neural networks (DNNs) have revolutionized the field of computer vision like object detection with their unparalleled performance. However, existing research has shown that DNNs are vulnerable to adversarial attacks. In the physical world, an adversary could exploit adversarial patches to implement a Hiding Attack (HA) which patches the target object to make it disappear from the detector, an… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  30. arXiv:2406.09750  [pdf, other

    cs.CV cs.AI

    ControlVAR: Exploring Controllable Visual Autoregressive Modeling

    Authors: Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Zhe Lin, Rita Singh, Bhiksha Raj

    Abstract: Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces ControlVAR, a novel… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 24 pages, 19 figures, 4 tables

  31. arXiv:2406.08100  [pdf, other

    cs.CL cs.AI

    Multimodal Table Understanding

    Authors: Mingyu Zheng, Xinwei Feng, Qingyi Si, Qiaoqiao She, Zheng Lin, Wenbin Jiang, Wei** Wang

    Abstract: Although great progress has been made by previous table understanding methods including recent approaches based on large language models (LLMs), they rely heavily on the premise that given tables must be converted into a certain text sequence (such as Markdown or HTML) to serve as model input. However, it is difficult to access such high-quality textual table representations in some real-world sce… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 23 pages, 16 figures, ACL 2024 main conference, camera-ready version

  32. arXiv:2406.07706  [pdf, other

    cs.CV

    Object-level Scene Deocclusion

    Authors: Zhengzhe Liu, Qing Liu, Chirui Chang, Jianming Zhang, Daniil Pakhomov, Haitian Zheng, Zhe Lin, Daniel Cohen-Or, Chi-Wing Fu

    Abstract: Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which pr… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH 2024. A foundation model for category-agnostic object deocclusion

  33. arXiv:2406.06541  [pdf, other

    cs.AR

    Global and Local Attention-based Inception U-Net for Static IR Drop Estimation

    Authors: Yilu Chen, Zhijie Cai, Min Wei, Zhifeng Lin, Jianli Chen

    Abstract: Static IR drop analysis is a fundamental and critical task in chip design since the IR drop will significantly affect the design's functionality, performance, and reliability. However, the process of IR drop analysis can be time-consuming, potentially taking several hours. Furthermore, in the process of fixing violations, it is frequently imperative to do IR drop analysis iteratively, hence exacer… ▽ More

    Submitted 27 April, 2024; originally announced June 2024.

    Comments: 7 pages, 8 figures

  34. arXiv:2406.05267  [pdf, other

    nucl-th astro-ph.SR

    Characterizing the nuclear models informed by PREX and CREX: a view from Bayesian inference

    Authors: Tianqi Zhao, Zidu Lin, Bharat Kumar, Andrew W. Steiner, Madappa Prakash

    Abstract: New measurements of the weak charge density distributions of $^{48}$Ca and $^{208}$Pb challenge existing nuclear models. In the post-PREX-CREX era, it is unclear if current models can simultaneously describe weak charge distributions along with accurate measurements of binding energy and charge radii. In this letter, we explore the parameter space of relativistic and non-relativistic models to stu… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 figures in the letter, 11 pages, 8 figures in the supplemental material

  35. arXiv:2406.04758  [pdf, other

    cs.CL

    Think out Loud: Emotion Deducing Explanation in Dialogues

    Authors: Jiangnan Li, Zheng Lin, Lanrui Wang, Qingyi Si, Yanan Cao, Mo Yu, Peng Fu, Wei** Wang, Jie Zhou

    Abstract: Humans convey emotions through daily dialogues, making emotion understanding a crucial step of affective intelligence. To understand emotions in dialogues, machines are asked to recognize the emotion for an utterance (Emotion Recognition in Dialogues, ERD); based on the emotion, then find causal utterances for the emotion (Emotion Cause Extraction in Dialogues, ECED). The setting of the two tasks… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  36. arXiv:2406.04589  [pdf, other

    cs.SD eess.AS

    MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement

    Authors: Zizhen Lin, Xiaoting Chen, Junyu Wang

    Abstract: Achieving a balance between lightweight design and high performance remains a challenging task for speech enhancement. In this paper, we introduce Multi-path Enhanced Taylor (MET) Transformer based U-net for Speech Enhancement (MUSE), a lightweight speech enhancement network built upon the Unet architecture. Our approach incorporates a novel Multi-path Enhanced Taylor (MET) Transformer block, whic… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This paper was accepted by Interspeech 2024

  37. arXiv:2406.03793  [pdf, other

    cs.LG cs.CV

    Low-Rank Similarity Mining for Multimodal Dataset Distillation

    Authors: Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, Yong-Lu Li

    Abstract: Though dataset distillation has witnessed rapid development in recent years, the distillation of multimodal data, e.g., image-text pairs, poses unique and under-explored challenges. Unlike unimodal data, image-text contrastive learning (ITC) data lack inherent categorization and should instead place greater emphasis on modality correspondence. In this work, we propose Low-Rank Similarity Mining (L… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024

  38. arXiv:2406.03792  [pdf, other

    cs.CL

    Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning

    Authors: Naibin Gu, Peng Fu, Xiyu Liu, Bowen Shen, Zheng Lin, Wei** Wang

    Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as the predominant technique for fine-tuning in the era of large language models. However, existing PEFT methods still have inadequate training efficiency. Firstly, the utilization of large-scale foundation models during the training process is excessively redundant for certain fine-tuning tasks. Secondly, as the model size increases, the growth i… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  39. arXiv:2406.03711  [pdf, other

    physics.flu-dyn cs.AI

    Pi-fusion: Physics-informed diffusion model for learning fluid dynamics

    Authors: **g Qiu, Jiancheng Huang, Xiangdong Zhang, Zeng Lin, Minglei Pan, Zengding Liu, Fen Miao

    Abstract: Physics-informed deep learning has been developed as a novel paradigm for learning physical dynamics recently. While general physics-informed deep learning methods have shown early promise in learning fluid dynamics, they are difficult to generalize in arbitrary time instants in real-world scenario, where the fluid motion can be considered as a time-variant trajectory involved large-scale particle… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  40. arXiv:2406.03520  [pdf, other

    cs.CV cs.AI cs.LG

    VideoPhy: Evaluating Physical Commonsense for Video Generation

    Authors: Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover

    Abstract: Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 36 pages, 26 figures, 8 tables

  41. arXiv:2406.02737  [pdf, other

    cs.CR cs.SE

    CAMP: Compiler and Allocator-based Heap Memory Protection

    Authors: Zhenpeng Lin, Zheng Yu, Ziyi Guo, Simone Campanoni, Peter Dinda, Xinyu Xing

    Abstract: The heap is a critical and widely used component of many applications. Due to its dynamic nature, combined with the complexity of heap management algorithms, it is also a frequent target for security exploits. To enhance the heap's security, various heap protection techniques have been introduced, but they either introduce significant runtime overhead or have limited protection. We present CAMP,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  42. arXiv:2406.02624  [pdf, other

    cs.CR cs.SE

    Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation

    Authors: Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu Xing

    Abstract: Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  43. arXiv:2406.02540  [pdf, other

    cs.CV

    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

    Authors: Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

    Abstract: Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an ef… ▽ More

    Submitted 30 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project Page: https://a-suozhang.xyz/viditq.github.io/

  44. arXiv:2406.02239  [pdf, other

    cs.NI

    Decentralized Physical Infrastructure Network (DePIN): Challenges and Opportunities

    Authors: Zhibin Lin, Taotao Wang, Long Shi, Shengli Zhang, Bin Cao

    Abstract: The widespread use of the Internet has posed challenges to existing centralized physical infrastructure networks. Issues such as data privacy risks, service disruptions, and substantial expansion costs have emerged. To address these challenges, an innovative network architecture called Decentralized Physical Infrastructure Network (DePIN) has emerged. DePIN leverages blockchain technology to decen… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  45. arXiv:2406.01960  [pdf, other

    cs.LG cs.AI

    Certifiably Byzantine-Robust Federated Conformal Prediction

    Authors: Mintong Kang, Zhen Lin, Jimeng Sun, Cao Xiao, Bo Li

    Abstract: Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  46. arXiv:2406.01806  [pdf, other

    cs.CL cs.AI

    Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation

    Authors: Zhen Lin, Shubhendu Trivedi, Jimeng Sun

    Abstract: The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence. Currently, the most commonly used confidence score function is the likelihood of the generated sequence, which, however, conflates semantic and syntactic components.… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  47. arXiv:2406.01154  [pdf, other

    cs.CV

    UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation

    Authors: Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan

    Abstract: Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namel… ▽ More

    Submitted 20 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  48. arXiv:2406.00701  [pdf, other

    math.ST stat.ME

    Profiled Transfer Learning for High Dimensional Linear Model

    Authors: Ziqian Lin, Junlong Zhao, Fang Wang, Hansheng Wang

    Abstract: We develop here a novel transfer learning methodology called Profiled Transfer Learning (PTL). The method is based on the \textit{approximate-linear} assumption between the source and target parameters. Compared with the commonly assumed \textit{vanishing-difference} assumption and \textit{low-rank} assumption in the literature, the \textit{approximate-linear} assumption is more flexible and less… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  49. arXiv:2406.00364  [pdf, other

    cs.RO

    Cognitive Manipulation: Semi-supervised Visual Representation and Classroom-to-real Reinforcement Learning for Assembly in Semi-structured Environments

    Authors: Chuang Wang, Lie Yang, Ze Lin, Yizhi Liao, Gang Chen, Longhan Xie

    Abstract: Assembling a slave object into a fixture-free master object represents a critical challenge in flexible manufacturing. Existing deep reinforcement learning-based methods, while benefiting from visual or operational priors, often struggle with small-batch precise assembly tasks due to their reliance on insufficient priors and high-costed model development. To address these limitations, this paper i… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 15 pages, 14 figures

  50. arXiv:2406.00281  [pdf, other

    cs.LG cs.AI

    Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data

    Authors: **tai Chen, Zhen Lin, Qiyuan Chen, Jimeng Sun

    Abstract: Tabular data from different tables exhibit significant diversity due to varied definitions and types of features, as well as complex inter-feature and feature-target relationships. Cross-dataset pretraining, which learns reusable patterns from upstream data to support downstream tasks, have shown notable success in various fields. Yet, when applied to tabular data prediction, this paradigm faces c… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.