Skip to main content

Showing 1–50 of 391 results for author: Huang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01050  [pdf, other

    cs.RO cs.AI

    Evolutionary Morphology Towards Overconstrained Locomotion via Large-Scale, Multi-Terrain Deep Reinforcement Learning

    Authors: Yenan Chen, Chuye Zhang, Pengxi Gu, Jianuo Qiu, Jiayi Yin, Nuofan Qiu, Guo**g Huang, Bangchao Huang, Zishang Zhang, Hui Deng, Wei Zhang, Fang Wan, Chaoyang Song

    Abstract: While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints'… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 13 pages, 5 figures, Accepted and Presented at ReMAR2024

  2. arXiv:2407.00944  [pdf

    cs.CV

    Diffusion Transformer Model With Compact Prior for Low-dose PET Reconstruction

    Authors: Bin Huang, Xubiao Liu, Lei Fang, Qiegen Liu, Bingxuan Li

    Abstract: Positron emission tomography (PET) is an advanced medical imaging technique that plays a crucial role in non-invasive clinical diagnosis. However, while reducing radiation exposure through low-dose PET scans is beneficial for patient safety, it often results in insufficient statistical data. This scarcity of data poses significant challenges for accurately reconstructing high-quality images, which… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2406.19617  [pdf, ps, other

    cs.LG cs.IT math.OC

    Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

    Authors: Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

    Abstract: Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the mi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.15334  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

    Authors: Brandon Huang, Chancharik Mitra, Assaf Arbelle, Leonid Karlinsky, Trevor Darrell, Roei Herzig

    Abstract: The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, wh… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.14103  [pdf, other

    cs.AI

    Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation

    Authors: Yanwei Zheng, Shaopu Feng, Bowen Huang, Changrui Li, Xiao Zhang, Dongxiao Yu

    Abstract: The task that requires an agent to navigate to a given object through only visual observation is called visual object navigation (VON). The main bottlenecks of VON are strategies exploration and prior knowledge exploitation. Traditional strategies exploration ignores the differences of searching and navigating stages, using the same reward in two stages, which reduces navigation performance and tr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global ship**, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.13219  [pdf, other

    cs.CV cs.CL

    MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

    Authors: Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

    Abstract: Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal kno… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.09489  [pdf, other

    cs.CV

    Language-driven Grasp Detection

    Authors: An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 19 pages. Accepted to CVPR24

  9. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.07558  [pdf, other

    cs.CY cs.AI cs.CV

    A Large Medical Model based on Visual Physiological Monitoring for Public Health

    Authors: Bin Huang, Changchen Zhao, Zimeng Liu, Shenda Hong, Baochang Zhang, Wen** Wang, Hui Liu

    Abstract: The widespread outbreak of the COVID-19 pandemic has sounded a warning about the globalization challenges in public health. In this context, the establishment of large-scale public health datasets, of medical models, and of decision-making systems with a human-centric approach holds strategic significance. Recently, groundbreaking advancements have emerged in AI methods for physiological signal mo… ▽ More

    Submitted 21 April, 2024; originally announced June 2024.

    Comments: 17 pages, 7 figures

  11. arXiv:2406.07275  [pdf, other

    cs.AI

    DCA-Bench: A Benchmark for Dataset Curation Agents

    Authors: Benhao Huang, Yingzhuo Yu, ** Huang, Xingjian Zhang, Jiaqi Ma

    Abstract: The quality of datasets plays an increasingly crucial role in the research and development of modern artificial intelligence (AI). Despite the proliferation of open dataset platforms nowadays, data quality issues, such as insufficient documentation, inaccurate annotations, and ethical concerns, remain common in datasets widely used in AI. Furthermore, these issues are often subtle and difficult to… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  12. arXiv:2406.06050  [pdf, other

    cs.CV

    Generalizable Human Gaussians from Single-View Image

    Authors: **nan Chen, Chen Li, Jianfeng Zhang, Hanlin Chen, Buzhen Huang, Gim Hee Lee

    Abstract: In this work, we tackle the task of learning generalizable 3D human Gaussians from a single image. The main challenge for this task is to recover detailed geometry and appearance, especially for the unobserved regions. To this end, we propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image. We design a diffusion-based coa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  13. arXiv:2406.03836  [pdf, other

    cs.CR cs.AI

    Proactive Detection of Physical Inter-rule Vulnerabilities in IoT Services Using a Deep Learning Approach

    Authors: Bing Huang, Chen Chen, Kwok-Yan Lam, Fuqun Huang

    Abstract: Emerging Internet of Things (IoT) platforms provide sophisticated capabilities to automate IoT services by enabling occupants to create trigger-action rules. Multiple trigger-action rules can physically interact with each other via shared environment channels, such as temperature, humidity, and illumination. We refer to inter-rule interactions via shared environment channels as a physical inter-ru… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE ICWS 2024 Workshop

  14. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Ling**g Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  15. arXiv:2406.00492  [pdf, other

    eess.IV cs.CV cs.LG

    SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation

    Authors: Xueying Zeng, Baixiang Huang, Yu Luo, Guangyu Wei, Songyan He, Yushuang Shao

    Abstract: Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  16. arXiv:2405.17167  [pdf

    eess.IV cs.CV

    Partitioned Hankel-based Diffusion Models for Few-shot Low-dose CT Reconstruction

    Authors: Wenhao Zhang, Bin Huang, Shuyue Chen, Xiaoling Xu, Weiwen Wu, Qiegen Liu

    Abstract: Low-dose computed tomography (LDCT) plays a vital role in clinical applications by mitigating radiation risks. Nevertheless, reducing radiation doses significantly degrades image quality. Concurrently, common deep learning methods demand extensive data, posing concerns about privacy, cost, and time constraints. Consequently, we propose a few-shot low-dose CT reconstruction method using Partitioned… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  17. arXiv:2405.17137  [pdf, other

    cs.CV

    Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label

    Authors: Kangye Ji, Fei Cheng, Zeqing Wang, Bohu Huang

    Abstract: Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning wi… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  18. arXiv:2405.17132  [pdf, other

    cs.LG

    Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

    Authors: Chun**g Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

    Abstract: Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook th… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  19. arXiv:2405.15165  [pdf, other

    cs.CL cs.AI cs.SE

    A Solution-based LLM API-using Methodology for Academic Information Seeking

    Authors: Yuanchun Wang, Jifan Yu, Zijun Yao, **g Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, **kai Zhang, **gyao Zhang, Bowen Huang, Yuanyao Li, Huihui Yuan, Lei Hou, Juanzi Li, Jie Tang

    Abstract: Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures

  20. arXiv:2405.13517  [pdf, other

    cs.CR cs.CL

    WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

    Authors: Baizhou Huang, Xiaojun Wan

    Abstract: With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensurin… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 pages

  21. arXiv:2405.12217  [pdf, other

    cs.CV cs.AI cs.LG

    Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning

    Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Salman Khan, Xin Gao, Lina Yao

    Abstract: Recent studies indicate that large multimodal models (LMMs) are highly robust against natural distribution shifts, often surpassing previous baselines. Despite this, domain-specific adaptation is still necessary, particularly in specialized areas like healthcare. Due to the impracticality of fine-tuning LMMs given their vast parameter space, this work investigates in-context learning (ICL) as an e… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 17 pages, 7 figures, 7 tables

  22. Jacobi Stability Analysis for Systems of ODEs Using Symbolic Computation

    Authors: Bo Huang, Dongming Wang, **g Yang

    Abstract: The classical theory of Kosambi-Cartan-Chern (KCC) developed in differential geometry provides a powerful method for analyzing the behaviors of dynamical systems. In the KCC theory, the properties of a dynamical system are described in terms of five geometrical invariants, of which the second corresponds to the so-called Jacobi stability of the system. Different from that of the Lyapunov stability… ▽ More

    Submitted 16 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Proceedings of the 2024 International Symposium on Symbolic and Algebraic Computation (ISSAC 2024)

    MSC Class: 34C07; 68W30

  23. arXiv:2405.09132  [pdf, other

    cs.SE

    EFACT: an External Function Auto-Completion Tool to Strengthen Static Binary Lifting

    Authors: Yilei Zhang, Haoyu Liao, Zekun Wang, Bo Huang, Jianmei Guo

    Abstract: Static binary lifting is essential in binary rewriting frameworks. Existing tools overlook the impact of External Function Completion (EXFC) in static binary lifting. EXFC recovers the prototypes of External Functions (EXFs, functions defined in standard shared libraries) using only the function symbols available. Incorrect EXFC can misinterpret the source binary, or cause memory overflows in stat… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  24. arXiv:2405.05814  [pdf

    eess.IV cs.CV

    MSDiff: Multi-Scale Diffusion Model for Ultra-Sparse View CT Reconstruction

    Authors: Pinhuang Tan, Mengxiao Geng, **gya Lu, Liu Shi, Bin Huang, Qiegen Liu

    Abstract: Computed Tomography (CT) technology reduces radiation haz-ards to the human body through sparse sampling, but fewer sampling angles pose challenges for image reconstruction. Score-based generative models are widely used in sparse-view CT re-construction, performance diminishes significantly with a sharp reduction in projection angles. Therefore, we propose an ultra-sparse view CT reconstruction me… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  25. arXiv:2405.05573  [pdf, other

    cs.CV cs.CR

    Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

    Authors: Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong

    Abstract: Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label map**, our… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  26. arXiv:2405.04669  [pdf, other

    cs.LG cs.CL

    Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

    Authors: Hanlin Zhu, Baihe Huang, Shaolun Zhang, Michael Jordan, Jiantao Jiao, Yuandong Tian, Stuart Russell

    Abstract: Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''A is B'', LLM fails to directly conclude ''B is A'' during inference, which is known as the ''reversal curse'' (Berglund et al., 2023). In this paper, we theoretically analyze the reversal c… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 40 pages, 15 figures

  27. arXiv:2405.02301  [pdf, other

    cs.CV

    TFCounter:Polishing Gems for Training-Free Object Counting

    Authors: Pan Ting, Jianfeng Lin, Wenhao Yu, Wenlong Zhang, Xiaoying Chen, **lu Zhang, Binqiang Huang

    Abstract: Object counting is a challenging task with broad application prospects in security surveillance, traffic management, and disease diagnosis. Existing object counting methods face a tri-fold challenge: achieving superior performance, maintaining high generalizability, and minimizing annotation costs. We develop a novel training-free class-agnostic object counter, TFCounter, which is prompt-context-a… ▽ More

    Submitted 12 March, 2024; originally announced May 2024.

    Comments: 14pages,11 figuers

    MSC Class: 68

  28. arXiv:2404.19154  [pdf, other

    cs.CL

    RTF: Region-based Table Filling Method for Relational Triple Extraction

    Authors: Ning An, Lei Hei, Yong Jiang, Wei** Meng, **g**g Hu, Boran Huang, Feiliang Ren

    Abstract: Relational triple extraction is crucial work for the automatic construction of knowledge graphs. Existing methods only construct shallow representations from a token or token pair-level. However, previous works ignore local spatial dependencies of relational triples, resulting in a weakness of entity pair boundary detection. To tackle this problem, we propose a novel Region-based Table Filling met… ▽ More

    Submitted 13 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: Rejected by EMNLP 2023

  29. arXiv:2404.18612  [pdf

    cs.RO

    Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains

    Authors: Chuheng Chen, Xinxing Chen, Shucong Yin, Yuxuan Wang, Binxin Huang, Yuquan Leng, Chenglong Fu

    Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and oth… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  30. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  31. arXiv:2404.14757  [pdf, other

    cs.LG cs.AI

    Integrating Mamba and Transformer for Long-Short Range Time Series Forecasting

    Authors: Xiongxiao Xu, Yueqing Liang, Baixiang Huang, Zhiling Lan, Kai Shu

    Abstract: Time series forecasting is an important problem and plays a key role in a variety of applications including weather forecasting, stock market, and scientific simulations. Although transformers have proven to be effective in capturing dependency, its quadratic complexity of attention mechanism prevents its further adoption in long-range time series forecasting, thus limiting them attend to short-ra… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  32. arXiv:2404.13701  [pdf, other

    cs.CV cs.LG

    Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation

    Authors: Guanlong Jiao, Chenyangguang Zhang, Haonan Yin, Yu Mo, Biqing Huang, Hui Pan, Yi Luo, **gxian Liu

    Abstract: Domain generalized semantic segmentation is an essential computer vision task, for which models only leverage source data to learn the capability of generalized semantic segmentation towards the unseen target domains. Previous works typically address this challenge by global style randomization or feature regularization. In this paper, we argue that given the observation that different local seman… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  33. arXiv:2404.13437  [pdf, other

    cs.CV

    High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces

    Authors: Baoru Huang, Yida Wang, Anh Nguyen, Daniel Elson, Francisco Vasconcelos, Danail Stoyanov

    Abstract: In surgical oncology, screening colonoscopy plays a pivotal role in providing diagnostic assistance, such as biopsy, and facilitating surgical navigation, particularly in polyp detection. Computer-assisted endoscopic surgery has recently gained attention and amalgamated various 3D computer vision techniques, including camera localization, depth estimation, surface reconstruction, etc. Neural Radia… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  34. arXiv:2404.11291  [pdf, other

    cs.CV

    Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption

    Authors: Buzhen Huang, Chen Li, Chongyang Xu, Liang Pan, Yangang Wang, Gim Hee Lee

    Abstract: Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration, but overlook the modeling of close interactions. In this work, we tackle the task of reconstructing closely interactive humans from a monocular video. The main challenge of this task comes from insufficient visual information caused by depth ambiguity and severe inter-person occ… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  35. arXiv:2404.11249  [pdf, other

    cs.CV

    A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene

    Authors: Wenbo Zhang, Yifan Zhang, Jianfeng Lin, Binqiang Huang, **lu Zhang, Wenhao Yu

    Abstract: Pre-trained vision-language (V-L) models such as CLIP have shown excellent performance in many downstream cross-modal tasks. However, most of them are only applicable to the English context. Subsequent research has focused on this problem and proposed improved models, such as CN-CLIP and AltCLIP, to facilitate their applicability to Chinese and even other languages. Nevertheless, these models suff… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  36. arXiv:2404.08001  [pdf, other

    hep-ph cs.AI cs.CL cs.LG hep-ex physics.comp-ph

    Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

    Authors: Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao, Jun Cao, Fazhi Qi, Changzheng Yuan

    Abstract: Large Language Models (LLMs) are undergoing a period of rapid updates and changes, with state-of-the-art (SOTA) model frequently being replaced. When applying LLMs to a specific scientific field, it's challenging to acquire unique domain knowledge while kee** the model itself advanced. To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

    ACM Class: I.2.7

  37. arXiv:2404.06758  [pdf, other

    cs.RO

    Toward Holistic Planning and Control Optimization for Dual-Arm Rearrangement

    Authors: Kai Gao, Zihe Ye, Duo Zhang, Baichuan Huang, **g** Yu

    Abstract: Long-horizon task and motion planning (TAMP) is notoriously difficult to solve, let alone optimally, due to the tight coupling between the interleaved (discrete) task and (continuous) motion planning phases, where each phase on its own is frequently an NP-hard or even PSPACE-hard computational challenge. In this study, we tackle the even more challenging goal of jointly optimizing task and motion… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: First three authors made equal contributions to this study

  38. arXiv:2404.06349  [pdf, other

    cs.LG

    CausalBench: A Comprehensive Benchmark for Causal Learning Capability of Large Language Models

    Authors: Yu Zhou, Xingyu Wu, Beicheng Huang, Jibin Wu, Liang Feng, Kay Chen Tan

    Abstract: Causality reveals fundamental principles behind data distributions in real-world scenarios, and the capability of large language models (LLMs) to understand causality directly impacts their efficacy across explaining outputs, adapting to new evidence, and generating counterfactuals. With the proliferation of LLMs, the evaluation of this capacity is increasingly garnering attention. However, the ab… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  39. arXiv:2404.06290  [pdf, other

    cs.NE

    Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models

    Authors: Beichen Huang, Xingyu Wu, Yu Zhou, Jibin Wu, Liang Feng, Ran Cheng, Kay Chen Tan

    Abstract: Large language models (LLMs) have gained widespread popularity and demonstrated exceptional performance not only in natural language processing (NLP) tasks but also in non-linguistic domains. Their potential as artificial general intelligence extends beyond NLP, showcasing promising capabilities in diverse optimization scenarios. Despite this rising trend, whether the integration of LLMs into thes… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  40. arXiv:2404.05241  [pdf, other

    cs.LG

    Lightweight Inference for Forward-Forward Algorithm

    Authors: Amin Aminifar, Baichuan Huang, Azra Abtahi, Amir Aminifar

    Abstract: The human brain performs tasks with an outstanding energy-efficiency, i.e., with approximately 20 Watts. The state-of-the-art Artificial/Deep Neural Networks (ANN/DNN), on the other hand, have recently been shown to consume massive amounts of energy. The training of these ANNs/DNNs is done almost exclusively based on the back-propagation algorithm, which is known to be biologically implausible. Th… ▽ More

    Submitted 14 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.00576  [pdf

    cs.LG cs.AI cs.CV

    Automated Bi-Fold Weighted Ensemble Algorithms and its Application to Brain Tumor Detection and Classification

    Authors: PoTsang B. Huang, Muhammad Rizwan, Mehboob Ali

    Abstract: The uncontrolled and unstructured growth of brain cells is known as brain tumor, which has one of the highest mortality rates among diseases from all types of cancers. Due to limited diagnostic and treatment capabilities, they pose significant challenges, especially in third-world countries. Early diagnosis plays a vital role in effectively managing brain tumors and reducing mortality rates. Howev… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  42. arXiv:2404.00247  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Facilitating Reinforcement Learning for Process Control Using Transfer Learning: Perspectives

    Authors: Runze Lin, Junghui Chen, Lei Xie, Hongye Su, Biao Huang

    Abstract: This paper provides insights into deep reinforcement learning (DRL) for process control from the perspective of transfer learning. We analyze the challenges of applying DRL in the field of process industries and the necessity of introducing transfer learning. Furthermore, recommendations and prospects are provided for future research directions on how transfer learning can be integrated with DRL t… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Final Version of Asian Control Conference (ASCC 2024)

  43. arXiv:2403.19438  [pdf, other

    cs.CV cs.RO

    SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

    Authors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang

    Abstract: Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://subjectdrive.github.io/

  44. arXiv:2403.19238  [pdf, other

    cs.CV cs.AI eess.IV

    Taming Lookup Tables for Efficient Image Retouching

    Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

    Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  45. arXiv:2403.19101  [pdf, other

    cs.CV cs.AI

    AAPMT: AGI Assessment Through Prompt and Metric Transformer

    Authors: Benhao Huang

    Abstract: The emergence of text-to-image models marks a significant milestone in the evolution of AI-generated images (AGIs), expanding their use in diverse domains like design, entertainment, and more. Despite these breakthroughs, the quality of AGIs often remains suboptimal, highlighting the need for effective evaluation methods. These methods are crucial for assessing the quality of images relative to th… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  46. 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

    Authors: Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao

    Abstract: 3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance… ▽ More

    Submitted 9 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 13 pages, 12 figures

  47. arXiv:2403.15711  [pdf, other

    cs.LG stat.ME stat.ML

    Identifiable Latent Neural Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. It is particularly good at predictions under unseen distribution shifts, because these shifts can generally be interpreted as consequences of interventions. Hence leveraging {seen} distribution shifts becomes a natural strategy to help identifying causal representations, which in… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  48. arXiv:2403.15691  [pdf, other

    cs.CV

    Temporal-Spatial Object Relations Modeling for Vision-and-Language Navigation

    Authors: Bowen Huang, Yanwei Zheng, Chuanlin Lan, Xinpeng Zhao, Yifei Zou, Dongxiao yu

    Abstract: Vision-and-Language Navigation (VLN) is a challenging task where an agent is required to navigate to a natural language described location via vision observations. The navigation abilities of the agent can be enhanced by the relations between objects, which are usually learned using internal objects or external datasets. The relationships between internal objects are modeled employing graph convol… ▽ More

    Submitted 16 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  49. arXiv:2403.13893  [pdf, other

    cs.LG

    Data Acquisition via Experimental Design for Decentralized Data Markets

    Authors: Charles Lu, Baihe Huang, Sai Praneeth Karimireddy, Praneeth Vepakomma, Michael Jordan, Ramesh Raskar

    Abstract: Acquiring high-quality training data is essential for current machine learning models. Data markets provide a way to increase the supply of data, particularly in data-scarce domains such as healthcare, by incentivizing potential data sellers to join the market. A major challenge for a data buyer in such a market is selecting the most valuable data points from a data seller. Unlike prior work in da… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 26 pages, 20 figures

  50. arXiv:2403.12170  [pdf, other

    cs.RO

    Sim2Real Manipulation on Unknown Objects with Tactile-based Reinforcement Learning

    Authors: Entong Su, Chengzhe Jia, Yuzhe Qin, Wenxuan Zhou, Annabella Macaluso, Binghao Huang, Xiaolong Wang

    Abstract: Using tactile sensors for manipulation remains one of the most challenging problems in robotics. At the heart of these challenges is generalization: How can we train a tactile-based policy that can manipulate unseen and diverse objects? In this paper, we propose to perform Reinforcement Learning with only visual tactile sensing inputs on diverse objects in a physical simulator. By training with di… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.