Skip to main content

Showing 1–50 of 237 results for author: Tan, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00826  [pdf, other

    cs.CL cs.SD eess.AS

    NAIST Simultaneous Speech Translation System for IWSLT 2024

    Authors: Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Tomoya Yanagita, Kosuke Doi, Mana Makinae, Haotian Tan, Makoto Sakai, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura

    Abstract: This paper describes NAIST's submission to the simultaneous track of the IWSLT 2024 Evaluation Campaign: English-to-{German, Japanese, Chinese} speech-to-text translation and English-to-Japanese speech-to-speech translation. We develop a multilingual end-to-end speech-to-text translation model combining two pre-trained language models, HuBERT and mBART. We trained this model with two decoding poli… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: IWSLT 2024 system paper

  2. arXiv:2406.19065  [pdf, other

    cs.CL

    STBench: Assessing the Ability of Large Language Models in Spatio-Temporal Analysis

    Authors: Wenbin Li, Di Yao, Ruibo Zhao, Wenjie Chen, Zijie Xu, Chengxue Luo, Chang Gong, Quanliang **g, Haining Tan, **g** Bi

    Abstract: The rapid evolution of large language models (LLMs) holds promise for reforming the methodology of spatio-temporal data mining. However, current works for evaluating the spatio-temporal understanding capability of LLMs are somewhat limited and biased. These works either fail to incorporate the latest language models or only focus on assessing the memorized spatio-temporal knowledge. To address thi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.18541  [pdf, other

    cs.CV cs.GR

    Refining 3D Point Cloud Normal Estimation via Sample Selection

    Authors: Jun Zhou, Yaoshun Li, Hongchen Tan, Mingjie Wang, Nannan Li, ** Liu

    Abstract: In recent years, point cloud normal estimation, as a classical and foundational algorithm, has garnered extensive attention in the field of 3D geometric processing. Despite the remarkable performance achieved by current Neural Network-based methods, their robustness is still influenced by the quality of training data and the models' performance. In this study, we designed a fundamental framework f… ▽ More

    Submitted 19 May, 2024; originally announced June 2024.

  4. arXiv:2406.16852  [pdf, other

    cs.CV

    Long Context Transfer from Language to Vision

    Authors: Peiyuan Zhang, Kaichen Zhang, Bo Li, Guangtao Zeng, **gkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu

    Abstract: Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backb… ▽ More

    Submitted 30 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Code, demo, and models are available at https://github.com/EvolvingLMMs-Lab/LongVA

  5. arXiv:2406.13975  [pdf, other

    cs.CL cs.AI

    MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

    Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, **gyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

    Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.13941  [pdf, other

    cs.IR cs.AI

    UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture

    Authors: Sitian Chen, Haobin Tan, Amelie Chi Zhou, Yusen Li, Pavan Balaji

    Abstract: Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware,… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.12164  [pdf, other

    cs.SD cs.AI eess.AS

    A Mel Spectrogram Enhancement Paradigm Based on CWT in Speech Synthesis

    Authors: Guoqiang Hu, Huaning Tan, Ruilai Li

    Abstract: Acoustic features play an important role in improving the quality of the synthesised speech. Currently, the Mel spectrogram is a widely employed acoustic feature in most acoustic models. However, due to the fine-grained loss caused by its Fourier transform process, the clarity of speech synthesised by Mel spectrogram is compromised in mutant signals. In order to obtain a more detailed Mel spectrog… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.09371  [pdf, other

    cs.CV cs.LG

    LRM-Zero: Training Large Reconstruction Models with Synthesized Data

    Authors: Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

    Abstract: We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D data… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 23 pages, 8 figures. Our code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/

  9. arXiv:2406.08980  [pdf, other

    q-bio.BM cs.LG

    From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

    Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2405.19885  [pdf, other

    cs.LG cs.RO

    Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

    Authors: Hengkai Tan, Songming Liu, Kai Ma, Chengyang Ying, Xingxing Zhang, Hang Su, Jun Zhu

    Abstract: Transformer has shown promise in reinforcement learning to model time-varying features for obtaining generalized low-level robot policies on diverse robotics datasets in embodied learning. However, it still suffers from the issues of low data efficiency and high inference latency. In this paper, we propose to investigate the task from a new perspective of the frequency domain. We first observe tha… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.16479  [pdf, other

    cs.CV

    Differentiable Proximal Graph Matching

    Authors: Haoru Tan, Chuang Wang, Xu-Yao Zhang, Cheng-Lin Liu

    Abstract: Graph matching is a fundamental tool in computer vision and pattern recognition. In this paper, we introduce an algorithm for graph matching based on the proximal operator, referred to as differentiable proximal graph matching (DPGM). Specifically, we relax and decompose the quadratic assignment problem for the graph matching into a sequence of convex optimization problems. The whole algorithm can… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  12. arXiv:2405.11430  [pdf, other

    cs.CL

    MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

    Authors: Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

    Abstract: Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and fo… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

  13. arXiv:2405.07530  [pdf, other

    cs.SE

    Prompt-based Code Completion via Multi-Retrieval Augmented Generation

    Authors: Hanzhuo Tan, Qi Luo, Ling Jiang, Zizheng Zhan, **g Li, Haotian Zhang, Yuqun Zhang

    Abstract: Automated code completion, aiming at generating subsequent tokens from unfinished code, has been significantly benefited from recent progress in pre-trained Large Language Models (LLMs). However, these models often suffer from coherence issues and hallucinations when dealing with complex code logic or extrapolating beyond their training data. Existing Retrieval Augmented Generation (RAG) technique… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  14. arXiv:2405.05355  [pdf, other

    cs.CV cs.RO

    Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images

    Authors: Conner Pulling, Je Hon Tan, Yaoyu Hu, Sebastian Scherer

    Abstract: Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduc… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  15. arXiv:2405.02213  [pdf, other

    cs.SE cs.AI cs.LG

    Automatic Programming: Large Language Models and Beyond

    Authors: Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

    Abstract: Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related is… ▽ More

    Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  16. arXiv:2404.19702  [pdf, other

    cs.CV

    GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting

    Authors: Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu

    Abstract: We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian para… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Project webpage: https://sai-bi.github.io/project/gs-lrm/

  17. arXiv:2404.19664  [pdf, other

    cs.RO cs.LG

    Towards Generalist Robot Learning from Internet Video: A Survey

    Authors: Robert McCarthy, Daniel C. H. Tan, Dominik Schmidt, Fernando Acero, Nathan Herr, Yilun Du, Thomas G. Thuruthel, Zhibin Li

    Abstract: This survey presents an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large internet video datasets and, in the process, extracting foundational knowledge about the world's dynamics and physical human behaviour. Such methods hold great promise for develo** general-purpose robots. We open w… ▽ More

    Submitted 7 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Updated formatting. Reduced paper length and made other minor improvements

  18. arXiv:2404.18760  [pdf, other

    cs.CV

    Flow AM: Generating Point Cloud Global Explanations by Latent Alignment

    Authors: Hanxiao Tan

    Abstract: Although point cloud models have gained significant improvements in prediction accuracy over recent years, their trustworthiness is still not sufficiently investigated. In terms of global explainability, Activation Maximization (AM) techniques in the image domain are not directly transplantable due to the special structure of the point cloud models. Existing studies exploit generative models to yi… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  19. arXiv:2404.17357  [pdf, other

    eess.IV cs.CV

    Simultaneous Tri-Modal Medical Image Fusion and Super-Resolution using Conditional Diffusion Model

    Authors: Yushen Xu, Xiaosong Li, Yuchan Jie, Haishu Tan

    Abstract: In clinical practice, tri-modal medical image fusion, compared to the existing dual-modal technique, can provide a more comprehensive view of the lesions, aiding physicians in evaluating the disease's shape, location, and biological activity. However, due to the limitations of imaging equipment and considerations for patient safety, the quality of medical images is usually limited, leading to sub-… ▽ More

    Submitted 13 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  20. arXiv:2404.17126  [pdf, other

    cs.LG cs.AI eess.IV physics.med-ph

    Deep Evidential Learning for Dose Prediction

    Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

    Abstract: In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of n… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 24 pages, 8 figures

  21. arXiv:2404.12652  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

    Authors: Yuan Zang, Tian Yun, Hao Tan, Trung Bui, Chen Sun

    Abstract: Do vision-language models (VLMs) pre-trained to caption an image of a "durian" learn visual concepts such as "brown" (color) and "spiky" (texture) at the same time? We aim to answer this question as visual concepts learned "for free" would enable wide applications such as neuro-symbolic reasoning or human-interpretable object classification. We assume that the visual concepts, if captured by pre-t… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  22. arXiv:2404.12386  [pdf, other

    cs.CV cs.LG

    SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

    Authors: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

    Abstract: Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  23. arXiv:2404.12385  [pdf, other

    cs.CV cs.GR

    MeshLRM: Large Reconstruction Model for High-Quality Mesh

    Authors: Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu

    Abstract: We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  24. arXiv:2404.08877  [pdf, other

    cs.SE cs.CL cs.LG

    Aligning LLMs for FL-free Program Repair

    Authors: Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He

    Abstract: Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  25. arXiv:2404.08506  [pdf, other

    cs.CV

    LaSagnA: Language-based Segmentation Assistant for Complex Queries

    Authors: Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma

    Abstract: Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks. Nonetheless, there are two constraints that restrict the further application of these vLLMs: the incapability of handling multiple targets per query and the failure to identify the absence of query objects in the image. In this study, we acknowle… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  26. arXiv:2404.05445  [pdf, other

    stat.ME cs.LG stat.CO

    Unsupervised Training of Convex Regularizers using Maximum Likelihood Estimation

    Authors: Hong Ye Tan, Ziruo Cai, Marcelo Pereyra, Subhadip Mukherjee, Junqi Tang, Carola-Bibiane Schönlieb

    Abstract: Unsupervised learning is a training approach in the situation where ground truth data is unavailable, such as inverse imaging problems. We present an unsupervised Bayesian training approach to learning convex neural network regularizers using a fixed noisy dataset, based on a dual Markov chain estimation method. Compared to classical supervised adversarial regularization methods, where there is ac… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    MSC Class: 62C12; 62F15; 65C40; 65J22

  27. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  28. arXiv:2403.10024  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage

    Authors: Hao Hao Tan, Kin Wai Cheuk, Taemin Cho, Wei-Hsiang Liao, Yuki Mitsufuji

    Abstract: This paper presents enhancements to the MT3 model, a state-of-the-art (SOTA) token-based multi-instrument automatic music transcription (AMT) model. Despite SOTA performance, MT3 has the issue of instrument leakage, where transcriptions are fragmented across different instruments. To mitigate this, we propose MR-MT3, with enhancements including a memory retention mechanism, prior token sampling, a… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  29. arXiv:2403.06457  [pdf, other

    cs.CV

    Ensemble Quadratic Assignment Network for Graph Matching

    Authors: Haoru Tan, Chuang Wang, Sitong Wu, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu

    Abstract: Graph matching is a commonly used technique in computer vision and pattern recognition. Recent data-driven approaches have improved the graph matching accuracy remarkably, whereas some traditional algorithm-based methods are more robust to feature noises, outlier nodes, and global transformation (e.g.~rotation). In this paper, we propose a graph neural network (GNN) based approach to combine the a… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCV in 2024

  30. arXiv:2403.05286  [pdf, other

    cs.PL cs.CL

    LLM4Decompile: Decompiling Binary Code with Large Language Models

    Authors: Hanzhuo Tan, Qi Luo, **g Li, Yuqun Zhang

    Abstract: Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce th… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  31. arXiv:2403.02231  [pdf, other

    cs.IR

    CODE-ACCORD: A Corpus of Building Regulatory Data for Rule Generation towards Automatic Compliance Checking

    Authors: Hansi Hettiarachchi, Amna Dridi, Mohamed Medhat Gaber, Pouyan Parsafard, Nicoleta Bocaneala, Katja Breitenfelder, Gonçal Costa, Maria Hedblom, Mihaela Juganaru-Mathieu, Thamer Mecharnia, Sumee Park, He Tan, Abdel-Rahman H. Tawil, Edlira Vakaj

    Abstract: Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. However, extracting information from textual rules to convert them to a machine-readable format has been a challenge due to the complexities associated with natural language and the limited resource… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: This is a preprint of an article submitted to the Data in Brief Journal, Elsevier

  32. arXiv:2402.17904  [pdf

    cs.RO

    4CNet: A Confidence-Aware, Contrastive, Conditional, Consistency Model for Robot Map Prediction in Multi-Robot Environments

    Authors: Aaron Hao Tan, Siddarth Narasimhan, Goldie Nejat

    Abstract: Mobile robots in unknown cluttered environments with irregularly shaped obstacles often face sensing, energy, and communication challenges which directly affect their ability to explore these environments. In this paper, we introduce a novel deep learning method, Confidence-Aware Contrastive Conditional Consistency Model (4CNet), for mobile robot map prediction during resource-limited exploration… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 14 pages, 10 figures

  33. arXiv:2402.14577  [pdf, other

    cs.CV

    Debiasing Text-to-Image Diffusion Models

    Authors: Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi

    Abstract: Learning-based Text-to-Image (TTI) models like Stable Diffusion have revolutionized the way visual content is generated in various domains. However, recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems, which raises important concerns. In this work, we target resolving the social bias in TTI diffusion models. We begin by formalizing the problem se… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  34. arXiv:2402.14366  [pdf, other

    cs.SE

    Understanding and Detecting Annotation-Induced Faults of Static Analyzers

    Authors: Huaien Zhang, Yu Pei, Shuyun Liang, Shin Hwei Tan

    Abstract: Static analyzers can reason about the properties and behaviors of programs and detect various issues without executing them. Hence, they should extract the necessary information to understand the analyzed program well. Annotation has been a widely used feature for different purposes in Java since the introduction of Java 5. Annotations can change program structures and convey semantics information… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 23 pages, 16 figures

  35. arXiv:2402.09288  [pdf, other

    cs.LG

    EcoVal: An Efficient Data Valuation Framework for Machine Learning

    Authors: Ayush K Tarun, Vikram S Chundawat, Murari Mandal, Hong Ming Tan, Bowei Chen, Mohan Kankanhalli

    Abstract: Quantifying the value of data within a machine learning workflow can play a pivotal role in making more strategic decisions in machine learning initiatives. The existing Shapley value based frameworks for data valuation in machine learning are computationally expensive as they require considerable amount of repeated training of the model to obtain the Shapley value. In this paper, we introduce an… ▽ More

    Submitted 7 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  36. arXiv:2402.06838  [pdf

    cs.RO

    NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments

    Authors: Haitong Wang, Aaron Hao Tan, Goldie Nejat

    Abstract: In unknown cluttered and dynamic environments such as disaster scenes, mobile robots need to perform target-driven navigation in order to find people or objects of interest, while being solely guided by images of the targets. In this paper, we introduce NavFormer, a novel end-to-end transformer architecture developed for robot target-driven navigation in unknown and dynamic environments. NavFormer… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  37. arXiv:2402.03752  [pdf, other

    cs.CV cs.LG

    Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

    Authors: Jen Hong Tan

    Abstract: Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets i… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 7 pages, 6 figures

  38. arXiv:2402.02096  [pdf, other

    cs.CV

    Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes

    Authors: Xilai Li, Xiaosong Li, Haishu Tan

    Abstract: Infrared and visible image fusion has emerged as a prominent research in computer vision. However, little attention has been paid on complex scenes fusion, causing existing techniques to produce sub-optimal results when suffers from real interferences. To fill this gap, we propose a decomposition-based and interference perception image fusion method. Specifically, we classify the pixels of visible… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  39. arXiv:2402.02090  [pdf, other

    cs.CV

    Physical Perception Network and an All-weather Multi-modality Benchmark for Adverse Weather Image Fusion

    Authors: Xilai Li, Wuyang Liu, Xiaosong Li, Haishu Tan

    Abstract: Multi-modality image fusion (MMIF) integrates the complementary information from different modal images to provide comprehensive and objective interpretation of a scenes. However, existing MMIF methods lack the ability to resist different weather interferences in real-life scenarios, preventing them from being useful in practical applications such as autonomous driving. To bridge this research gap… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  40. arXiv:2402.01269  [pdf, other

    cs.CV

    Spectrum-guided Feature Enhancement Network for Event Person Re-Identification

    Authors: Hongchen Tan, Yi Zhang, ** Liu, Baocai Yin, Nan Ma, Xin Li, Huchuan Lu

    Abstract: As a cutting-edge biosensor, the event camera holds significant potential in the field of computer vision, particularly regarding privacy preservation. However, compared to traditional cameras, event streams often contain noise and possess extremely sparse semantics, posing a formidable challenge for event-based person re-identification (event Re-ID). To address this, we introduce a novel event pe… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  41. arXiv:2402.01212  [pdf, other

    cs.CV

    TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network

    Authors: Yuchan Jie, Yushen Xu, Xiaosong Li, Haishu Tan

    Abstract: Multi-modality image fusion involves integrating complementary information from different modalities into a single image. Current methods primarily focus on enhancing image fusion with a single advanced task such as incorporating semantic or object-related information into the fusion process. This method creates challenges in achieving multiple objectives simultaneously. We introduce a target and… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  42. arXiv:2401.17881  [pdf, other

    cs.CV

    PVLR: Prompt-driven Visual-Linguistic Representation Learning for Multi-Label Image Recognition

    Authors: Hao Tan, Zichang Tan, Jun Li, Jun Wan, Zhen Lei

    Abstract: Multi-label image recognition is a fundamental task in computer vision. Recently, vision-language models have made notable advancements in this area. However, previous methods often failed to effectively leverage the rich knowledge within language models and instead incorporated label semantics into visual features in a unidirectional manner. In this paper, we propose a Prompt-driven Visual-Lingui… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures

  43. arXiv:2401.15669  [pdf

    cs.ET q-bio.BM

    Programmable biomolecule-mediated processors

    Authors: Jian-Jun Shu, Zi Hian Tan, Qi-Wen Wang, Kian-Yan Yong

    Abstract: Programmable biomolecule-mediated computing is a new computing paradigm as compared to contemporary electronic computing. It employs nucleic acids and analogous biomolecular structures as information-storing and -processing substrates to tackle computational problems. It is of great significance to investigate the various issues of programmable biomolecule-mediated processors that are capable of a… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

    Journal ref: Journal of the American Chemical Society, Vol. 145, No. 46, pp. 25033-25042, 2023

  44. arXiv:2401.15234  [pdf, other

    cs.SE

    Moving beyond Deletions: Program Simplification via Diverse Program Transformations

    Authors: Haibo Wang, Zezhong Xing, Zheng Wang, Chengnian Sun, Shin Hwei Tan

    Abstract: To reduce the complexity of software, Developers manually simplify program (known as developer-induced program simplification in this paper) to reduce its code size yet preserving its functionality but manual simplification is time-consuming and error-prone. To reduce manual effort, rule-based approaches (e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can be potentially a… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  45. arXiv:2401.15042  [pdf, other

    cs.CL cs.AI

    PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

    Authors: Haochen Tan, Zhijiang Guo, Zhan Shi, Lu Xu, Zhili Liu, Yunlong Feng, Xiaoguang Li, Yasheng Wang, Lifeng Shang, Qun Liu, Linqi Song

    Abstract: Large Language Models (LLMs) have succeeded remarkably in understanding long-form contents. However, exploring their capability for generating long-form contents, such as reports and articles, has been relatively unexplored and inadequately assessed by existing benchmarks. The prevalent evaluation methods, which predominantly rely on crowdsourcing, are recognized for their labor-intensive nature a… ▽ More

    Submitted 4 June, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to ACL 2024 main conference

  46. arXiv:2401.14938  [pdf, other

    cs.CV

    DAM: Diffusion Activation Maximization for 3D Global Explanations

    Authors: Hanxiao Tan

    Abstract: In recent years, the performance of point cloud models has been rapidly improved. However, due to the limited amount of relevant explainability studies, the unreliability and opacity of these black-box models may lead to potential risks in applications where human lives are at stake, e.g. autonomous driving or healthcare. This work proposes a DDPM-based point cloud global explainability method (DA… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  47. arXiv:2401.12435  [pdf, ps, other

    cs.AI cs.LG math.AP

    Quantitative Analysis of Molecular Transport in the Extracellular Space Using Physics-Informed Neural Network

    Authors: Jiayi Xie, Hongfeng Li, ** Cheng, Qingrui Cai, Hanbo Tan, Lingyun Zu, Xiaobo Qu, Hongbin Han

    Abstract: The brain extracellular space (ECS), an irregular, extremely tortuous nanoscale space located between cells or between cells and blood vessels, is crucial for nerve cell survival. It plays a pivotal role in high-level brain functions such as memory, emotion, and sensation. However, the specific form of molecular transport within the ECS remain elusive. To address this challenge, this paper propose… ▽ More

    Submitted 23 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  48. arXiv:2401.12175  [pdf, other

    cs.CV

    Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM

    Authors: Zhenzhen Weng, **gyuan Liu, Hao Tan, Zhan Xu, Yang Zhou, Serena Yeung-Levy, Jimei Yang

    Abstract: Reconstructing 3D humans from a single image has been extensively investigated. However, existing approaches often fall short on capturing fine geometry and appearance details, hallucinating occluded parts with plausible details, and achieving generalization across unseen and in-the-wild datasets. We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a hum… ▽ More

    Submitted 14 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Project Page: https://zzweng.github.io/humanlrm

  49. arXiv:2401.11911  [pdf, other

    cs.CL cs.AI

    Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?

    Authors: Hexiang Tan, Fei Sun, Wanli Yang, Yuanzhuo Wang, Qi Cao, Xueqi Cheng

    Abstract: While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses are attributed to either generated or retrieved contexts. To easily trac… ▽ More

    Submitted 10 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted at ACL 2024 Main, Homepage (https://tan-hexiang.github.io/Blinded_by_Generated_Contexts/)

  50. arXiv:2401.08357  [pdf, other

    cs.CV

    SAMF: Small-Area-Aware Multi-focus Image Fusion for Object Detection

    Authors: Xilai Li, Xiaosong Li, Haishu Tan, **yang Li

    Abstract: Existing multi-focus image fusion (MFIF) methods often fail to preserve the uncertain transition region and detect small focus areas within large defocused regions accurately. To address this issue, this study proposes a new small-area-aware MFIF algorithm for enhancing object detection capability. First, we enhance the pixel attributes within the small focus and boundary regions, which are subseq… ▽ More

    Submitted 31 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024