Skip to main content

Showing 1–50 of 224 results for author: Huo, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02386  [pdf, other

    cs.CV

    OpenSlot: Mixed Open-set Recognition with Object-centric Learning

    Authors: Xu Yin, Fei Pan, Guoyuan An, Yuchi Huo, Zixuan Xie, Sung-Eui Yoon

    Abstract: Existing open-set recognition (OSR) studies typically assume that each image contains only one class label, and the unknown test set (negative) has a disjoint label space from the known test set (positive), a scenario termed full-label shift. This paper introduces the mixed OSR problem, where test images contain multiple class semantics, with known and unknown classes co-occurring in negatives, le… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: This study is under IEEE TMM review

  2. arXiv:2407.00596  [pdf, other

    eess.IV cs.CV

    HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.19286

  3. arXiv:2406.19540  [pdf, other

    cs.CV

    Weighted Circle Fusion: Ensembling Circle Representation from Different Object Detection Results

    Authors: Jialin Yue, Tianyuan Yao, Ruining Deng, Quan Liu, Juming Xiong, Haichun Yang, Yuankai Huo

    Abstract: Recently, the use of circle representation has emerged as a method to improve the identification of spherical objects (such as glomeruli, cells, and nuclei) in medical imaging studies. In traditional bounding box-based object detection, combining results from multiple models improves accuracy, especially when real-time processing isn't crucial. Unfortunately, this widely adopted strategy is not re… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.16386  [pdf, other

    cs.SE cs.AI

    Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

    Authors: Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, Michael R. Lyu

    Abstract: Websites are critical in today's digital world, with over 1.11 billion currently active and approximately 252,000 new sites launched daily. Converting website layout design into functional UI code is a time-consuming yet indispensable step of website development. Manual methods of converting visual designs into functional code present significant challenges, especially for non-experts. To explore… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  5. arXiv:2406.16360  [pdf, other

    cs.CV cs.GR

    MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling

    Authors: Yuxin Dai, Qi Wang, **gsen Zhu, Dianbing Xi, Yuchi Huo, Chen Qian, Ying He

    Abstract: We present MIRReS, a novel two-stage inverse rendering framework that jointly reconstructs and optimizes the explicit geometry, material, and lighting from multi-view images. Unlike previous methods that rely on implicit irradiance fields or simplified path tracing algorithms, our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 16 pages, 14 figures

  6. arXiv:2406.15755  [pdf, other

    cs.CV cs.AI

    Fine-grained Background Representation for Weakly Supervised Semantic Segmentation

    Authors: Xu Yin, Woobin Im, Dongbo Min, Yuchi Huo, Fei Pan, Sung-Eui Yoon

    Abstract: Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discriminate the foreground (FG) objects from the suspicious background (BG) pixels (a.k.a. co-occurring) and learn the integral object regions. This paper pr… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  7. arXiv:2406.14129  [pdf, other

    cs.CV cs.CL cs.MM

    Towards Event-oriented Long Video Understanding

    Authors: Yifan Du, Kun Zhou, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen

    Abstract: With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work on progress

  8. arXiv:2406.11317  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    GUICourse: From General Vision Language Models to Versatile GUI Agents

    Authors: Wentong Chen, Junbo Cui, **yi Hu, Yujia Qin, Junjie Fang, Yue Zhao, Chongyi Wang, Jun Liu, Guirong Chen, Yupeng Huo, Yuan Yao, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Utilizing Graphic User Interface (GUI) for human-computer interaction is essential for accessing a wide range of digital tools. Recent advancements in Vision Language Models (VLMs) highlight the compelling potential to develop versatile agents to help humans finish GUI navigation tasks. However, current VLMs are challenged in terms of fundamental abilities (OCR and grounding) and GUI knowledge (th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11242  [pdf, other

    cs.CV

    Accurate and Fast Pixel Retrieval with Spatial and Uncertainty Aware Hypergraph Diffusion

    Authors: Guoyuan An, Yuchi Huo, Sung-Eui Yoon

    Abstract: This paper presents a novel method designed to enhance the efficiency and accuracy of both image retrieval and pixel retrieval. Traditional diffusion methods struggle to propagate spatial information effectively in conventional graphs due to their reliance on scalar edge weights. To overcome this limitation, we introduce a hypergraph-based framework, uniquely capable of efficiently propagating spa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.09367  [pdf, other

    cs.CV

    Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

    Authors: Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo, Bingning Wang, Weipeng Chen, **g Liu

    Abstract: Video understanding is a crucial next step for multimodal large language models (MLLMs). To probe specific aspects of video understanding ability, existing video benchmarks typically require careful video selection based on the target capability, along with laborious annotation of query-response pairs to match the specific video content. This process is both challenging and resource-intensive. In… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  11. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  12. arXiv:2405.17824  [pdf, other

    cs.CV

    mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis

    Authors: Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Vishwesh Nath, Yucheng Tang, Yuankai Huo

    Abstract: Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g.,… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  13. arXiv:2405.17568  [pdf, other

    cs.CV

    ExtremeMETA: High-speed Lightweight Image Segmentation Model by Remodeling Multi-channel Metamaterial Imagers

    Authors: Quan Liu, Brandon T. Swartz, Ivan Kravchenko, Jason G. Valentine, Yuankai Huo

    Abstract: Deep neural networks (DNNs) have heavily relied on traditional computational units like CPUs and GPUs. However, this conventional approach brings significant computational burdens, latency issues, and high power consumption, limiting their effectiveness. This has sparked the need for lightweight networks like ExtremeC3Net. On the other hand, there have been notable advancements in optical computat… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  14. arXiv:2405.16141  [pdf, other

    cs.LG cs.AI cs.CE

    AIGB: Generative Auto-bidding via Diffusion Modeling

    Authors: Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Yan Zhang, Bo Zheng

    Abstract: Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon… ▽ More

    Submitted 27 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  15. arXiv:2405.14580  [pdf, other

    cs.GR

    LDM: Large Tensorial SDF Model for Textured Mesh Generation

    Authors: Rengan Xie, Wenting Zheng, Kai Huang, Yizheng Chen, Qi Wang, Qi Ye, Wei Chen, Yuchi Huo

    Abstract: Previous efforts have managed to generate production-ready 3D assets from text or images. However, these methods primarily employ NeRF or 3D Gaussian representations, which are not adept at producing smooth, high-quality geometries required by modern rendering pipelines. In this paper, we propose LDM, a novel feed-forward framework capable of generating high-fidelity, illumination-decoupled textur… ▽ More

    Submitted 20 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.11270  [pdf, other

    cs.CV

    HR Human: Modeling Human Avatars with Triangular Mesh and High-Resolution Textures from Videos

    Authors: Qifeng Chen, Rengan Xie, Kai Huang, Qi Wang, Wenting Zheng, Rong Li, Yuchi Huo

    Abstract: Recently, implicit neural representation has been widely used to generate animatable human avatars. However, the materials and geometry of those representations are coupled in the neural network and hard to edit, which hinders their application in traditional graphics engines. We present a framework for acquiring human avatars that are attached with high-resolution physically-based material textur… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  17. arXiv:2405.09045  [pdf, other

    cs.CV

    AMSNet: Netlist Dataset for AMS Circuits

    Authors: Zhuofu Tao, Yichen Shi, Yiru Huo, Rui Ye, Zonghang Li, Li Huang, Chen Wu, Na Bai, Zhi** Yu, Ting-Jung Lin, Lei He

    Abstract: Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention. The advent of multimodal large language models (MLLMs) has unveiled significant potential across various fields, suggesting their applicability in streamlining large-scale AMS IC design as well. A bottleneck in employing MLLMs for automatic AMS circuit generation is the absence of a comprehens… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  18. arXiv:2405.03652  [pdf

    cs.CV

    Field-of-View Extension for Diffusion MRI via Deep Generative Models

    Authors: Chenyu Gao, Shunxing Bao, Michael Kim, Nancy Newlin, Praitayini Kanakaraj, Tianyuan Yao, Gaurav Rudravaram, Yuankai Huo, Daniel Moyer, Kurt Schilling, Walter Kukull, Arthur Toga, Derek Archer, Timothy Hohman, Bennett Landman, Zhiyuan Li

    Abstract: Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tracto… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 20 pages, 11 figures

  19. arXiv:2404.13896  [pdf, other

    cs.CV

    CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory

    Authors: Yunlong Ran, Yanxu Li, Qi Ye, Yuchi Huo, Zechun Bai, Jiahao Sun, Jiming Chen

    Abstract: Neural radiance field (NeRF) has achieved impressive results in high-quality 3D scene reconstruction. However, NeRF heavily relies on precise camera poses. While recent works like BARF have introduced camera pose optimization within NeRF, their applicability is limited to simple trajectory scenes. Existing methods struggle while tackling complex trajectories involving large rotations. To address t… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  20. arXiv:2404.09707  [pdf, other

    cs.CV cs.AI cs.LG

    Adaptive Patching for High-resolution Image Segmentation with Transformers

    Authors: Enzhi Zhang, Isaac Lyngaas, Peng Chen, Xiao Wang, Jun Igarashi, Yuankai Huo, Mohamed Wahib, Masaharu Munetomo

    Abstract: Attention-based models are proliferating in the space of image analytics, including segmentation. The standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attenti… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  21. arXiv:2404.00714  [pdf, other

    cs.CV

    Neural Radiance Field-based Visual Rendering: A Comprehensive Review

    Authors: Mingyuan Yao, Yukang Huo, Yang Ran, Qingbin Tian, Ruifeng Wang, Haihua Wang

    Abstract: In recent years, Neural Radiance Fields (NeRF) has made remarkable progress in the field of computer vision and graphics, providing strong technical support for solving key tasks including 3D scene understanding, new perspective synthesis, human body reconstruction, robotics, and so on, the attention of academics to this research result is growing. As a revolutionary neural implicit field represen… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 35 pages, 22 figures, 14 tables, 18 formulas

  22. Face It Yourselves: An LLM-Based Two-Stage Strategy to Localize Configuration Errors via Logs

    Authors: Shiwen Shan, Yintong Huo, Yuxin Su, Yichen Li, Dan Li, Zibin Zheng

    Abstract: Configurable software systems are prone to configuration errors, resulting in significant losses to companies. However, diagnosing these errors is challenging due to the vast and complex configuration space. These errors pose significant challenges for both experienced maintainers and new end-users, particularly those without access to the source code of the software systems. Given that logs are e… ▽ More

    Submitted 2 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 13 pages, accepted by ISSTA 2024 (The 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis)

  23. arXiv:2403.17574  [pdf, other

    cs.SE cs.DC

    SPES: Towards Optimizing Performance-Resource Trade-Off for Serverless Functions

    Authors: Cheryl Lee, Zhouruixin Zhu, Tianyi Yang, Yintong Huo, Yuxin Su, Pinjia He, Michael R. Lyu

    Abstract: As an emerging cloud computing deployment paradigm, serverless computing is gaining traction due to its efficiency and ability to harness on-demand cloud resources. However, a significant hurdle remains in the form of the cold start problem, causing latency when launching new function instances from scratch. Existing solutions tend to use over-simplistic strategies for function pre-loading/unloadi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)

  24. arXiv:2403.11626  [pdf, other

    cs.GR cs.AI cs.CV cs.MM cs.SD eess.AS

    QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation

    Authors: Zhizhen Zhou, Ye**g Huo, Guoheng Huang, An Zeng, Xuhang Chen, Lian Huang, Zinuo Li

    Abstract: The study of music-generated dance is a novel and challenging Image generation task. It aims to input a piece of music and seed motions, then generate natural dance movements for the subsequent music. Transformer-based methods face challenges in time series prediction tasks related to human movements and music due to their struggle in capturing the nonlinear relationship and temporal aspects. This… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by The Visual Computer Journal

  25. arXiv:2403.11507  [pdf, other

    cs.CV

    Circle Representation for Medical Instance Object Segmentation

    Authors: Juming Xiong, Ethan H. Nguyen, Yilin Liu, Ruining Deng, Regina N Tyree, Hernan Correa, Girish Hiremath, Yaohong Wang, Haichun Yang, Agnes B. Fogo, Yuankai Huo

    Abstract: Recently, circle representation has been introduced for medical imaging, designed specifically to enhance the detection of instance objects that are spherically shaped (e.g., cells, glomeruli, and nuclei). Given its outstanding effectiveness in instance detection, it is compelling to consider the application of circle representation for segmenting instance medical objects. In this study, we introd… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  26. arXiv:2403.07728  [pdf, other

    stat.ML cs.LG stat.ME

    CAP: A General Algorithm for Online Selective Conformal Prediction with FCR Control

    Authors: Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou

    Abstract: We study the problem of post-selection predictive inference in an online fashion. To avoid devoting resources to unimportant units, a preliminary selection of the current individual before reporting its prediction interval is common and meaningful in online predictive tasks. Since the online selection causes a temporal multiplicity in the selected prediction intervals, it is important to control t… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  27. arXiv:2403.06640  [pdf, other

    eess.SY cs.RO math.OC

    Passive iFIR Filters for Data-Driven Control

    Authors: Zixing Wang, Yongkang Huo, Fulvio Forni

    Abstract: We consider the design of a new class of passive iFIR controllers given by the parallel action of an integrator and a finite impulse response filter. iFIRs are more expressive than PID controllers but retain their features and simplicity. The paper provides a model-free data-driven design for passive iFIR controllers based on virtual reference feedback tuning. Passivity is enforced through constra… ▽ More

    Submitted 29 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 6 pages, 7 figures, Accepted by IEEE Control Systems Letters (L-CSS) with the option to present it to 2024 Conference on Decision and Control (CDC 2024)

    Journal ref: IEEE Control Systems Letters, vol. 8, pp. 1289-1294, 2024

  28. arXiv:2402.19286  [pdf, other

    eess.IV cs.CV

    PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

    Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

    Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

  29. arXiv:2402.15102  [pdf, other

    cs.LG cs.AI cs.GT cs.IR

    Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding

    Authors: Haoming Li, Yusen Huo, Shuai Dou, Zhenzhe Zheng, Zhilin Zhang, Chuan Yu, Jian Xu, Fan Wu

    Abstract: In online advertising, advertisers participate in ad auctions to acquire ad opportunities, often by utilizing auto-bidding tools provided by demand-side platforms (DSPs). The current auto-bidding algorithms typically employ reinforcement learning (RL). However, due to safety concerns, most RL-based auto-bidding policies are trained in simulation, leading to a performance degradation when deployed… ▽ More

    Submitted 8 April, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted by The Web Conference 2024 (WWW'24) as an oral paper

  30. arXiv:2402.12958  [pdf, other

    cs.SE

    Go Static: Contextualized Logging Statement Generation

    Authors: Yichen Li, Yintong Huo, Renyi Zhong, Zhihan Jiang, **yang Liu, Junjie Huang, Jiazhen Gu, Pinjia He, Michael R. Lyu

    Abstract: Logging practices have been extensively investigated to assist developers in writing appropriate logging statements for documenting software behaviors. Although numerous automatic logging approaches have been proposed, their performance remains unsatisfactory due to the constraint of the single-method input, without informative programming context outside the method. Specifically, we identify thre… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024)

  31. arXiv:2402.10937  [pdf

    cs.AR cs.AI cs.CE cs.GT cs.LG

    A Lightweight Inception Boosted U-Net Neural Network for Routability Prediction

    Authors: Hailiang Li, Yan Huo, Yan Wang, Xu Yang, Miaohui Hao, Xiao Wang

    Abstract: As the modern CPU, GPU, and NPU chip design complexity and transistor counts keep increasing, and with the relentless shrinking of semiconductor technology nodes to nearly 1 nanometer, the placement and routing have gradually become the two most pivotal processes in modern very-large-scale-integrated (VLSI) circuit back-end design. How to evaluate routability efficiently and accurately in advance… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: The paper is submitted to the International Symposium of EDA (2024, XiAn, China)

  32. arXiv:2402.03630  [pdf, other

    cs.SE cs.AI

    Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context

    Authors: Yichen Li, Yun Peng, Yintong Huo, Michael R. Lyu

    Abstract: Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in develo** code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that… ▽ More

    Submitted 19 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  33. arXiv:2402.00028  [pdf, other

    cs.GR cs.CV eess.IV

    Neural Rendering and Its Hardware Acceleration: A Review

    Authors: Xinkai Yan, Jieting Xu, Yuchi Huo, Hujun Bao

    Abstract: Neural rendering is a new image and video generation method based on deep learning. It combines the deep learning model with the physical knowledge of computer graphics, to obtain a controllable and realistic scene model, and realize the control of scene attributes such as lighting, camera parameters, posture and so on. On the one hand, neural rendering can not only make full use of the advantages… ▽ More

    Submitted 6 January, 2024; originally announced February 2024.

  34. arXiv:2401.15841  [pdf, other

    cs.CV

    2L3: Lifting Imperfect Generated 2D Images into Accurate 3D

    Authors: Yizheng Chen, Rengan Xie, Qi Ye, Sen Yang, Zixuan Xie, Tianxiao Chen, Rong Li, Yuchi Huo

    Abstract: Reconstructing 3D objects from a single image is an intriguing but challenging problem. One promising solution is to utilize multi-view (MV) 3D reconstruction to fuse generated MV images into consistent 3D objects. However, the generated images usually suffer from inconsistent lighting, misaligned geometry, and sparse views, leading to poor reconstruction quality. To cope with these problems, we p… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  35. arXiv:2401.07854  [pdf, other

    cs.CV

    $M^{2}$Fusion: Bayesian-based Multimodal Multi-level Fusion on Colorectal Cancer Microsatellite Instability Prediction

    Authors: Quan Liu, Jiawen Yao, Lisha Yao, Xin Chen, **gren Zhou, Le Lu, Ling Zhang, Zaiyi Liu, Yuankai Huo

    Abstract: Colorectal cancer (CRC) micro-satellite instability (MSI) prediction on histopathology images is a challenging weakly supervised learning task that involves multi-instance learning on gigapixel images. To date, radiology images have proven to have CRC MSI information and efficient patient imaging techniques. Different data modalities integration offers the opportunity to increase the accuracy and… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  36. arXiv:2401.07654  [pdf, other

    cs.CV

    Foundation Models for Biomedical Image Segmentation: A Survey

    Authors: Ho Hin Lee, Yu Gu, Theodore Zhao, Yanbo Xu, Jianwei Yang, Naoto Usuyama, Cliff Wong, Mu Wei, Bennett A. Landman, Yuankai Huo, Alberto Santamaria-Pang, Hoifung Poon

    Abstract: Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical im… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 22 pages, 4 figures, 7 tables

  37. arXiv:2401.05602  [pdf

    cs.CV

    Nucleus subtype classification using inter-modality learning

    Authors: Lucas W. Remedios, Shunxing Bao, Samuel W. Remedios, Ho Hin Lee, Leon Y. Cai, Thomas Li, Ruining Deng, Can Cui, Jia Li, Qi Liu, Ken S. Lau, Joseph T. Roland, Mary K. Washington, Lori A. Coburn, Keith T. Wilson, Yuankai Huo, Bennett A. Landman

    Abstract: Understanding the way cells communicate, co-locate, and interrelate is essential to understanding human physiology. Hematoxylin and eosin (H&E) staining is ubiquitously available both for clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge has recently innovated on robust artificial intelligence labeling of six cell types on H&E stains of the colon.… ▽ More

    Submitted 28 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  38. arXiv:2401.03060  [pdf

    eess.IV cs.CV

    Super-resolution multi-contrast unbiased eye atlases with deep probabilistic refinement

    Authors: Ho Hin Lee, Adam M. Saunders, Michael E. Kim, Samuel W. Remedios, Lucas W. Remedios, Yucheng Tang, Qi Yang, Xin Yu, Shunxing Bao, Chloe Cho, Louise A. Mawn, Tonia S. Rex, Kevin L. Schey, Blake E. Dewey, Jeffrey M. Spraggins, Jerry L. Prince, Yuankai Huo, Bennett A. Landman

    Abstract: Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference. Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Revised for submission to SPIE Journal of Medical Imaging. 26 pages, 6 figures

  39. arXiv:2312.16425  [pdf, other

    cs.CV

    In-Hand 3D Object Reconstruction from a Monocular RGB Video

    Authors: Shijian Jiang, Qi Ye, Rengan Xie, Yuchi Huo, Xiang Li, Yang Zhou, Jiming Chen

    Abstract: Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object. However, these methods falter in accurately capturing the shape within the hand-object contac… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  40. arXiv:2311.11825  [pdf, other

    cs.CV cs.GR

    Holistic Inverse Rendering of Complex Facade via Aerial 3D Scanning

    Authors: Zixuan Xie, Rengan Xie, Rong Li, Kai Huang, Pengju Qiao, **gsen Zhu, Xu Yin, Qi Ye, Wei Hua, Yuchi Huo, Hujun Bao

    Abstract: In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually h… ▽ More

    Submitted 8 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  41. arXiv:2311.03500  [pdf

    eess.IV cs.CV q-bio.NC

    Predicting Age from White Matter Diffusivity with Residual Learning

    Authors: Chenyu Gao, Michael E. Kim, Ho Hin Lee, Qi Yang, Nazirah Mohd Khairi, Praitayini Kanakaraj, Nancy R. Newlin, Derek B. Archer, Angela L. Jefferson, Warren D. Taylor, Brian D. Boyd, Lori L. Beason-Held, Susan M. Resnick, The BIOCARD Study Team, Yuankai Huo, Katherine D. Van Schaik, Kurt G. Schilling, Daniel Moyer, Ivana IĆĄgum, Bennett A. Landman

    Abstract: Imaging findings inconsistent with those expected at specific chronological age ranges may serve as early indicators of neurological disorders and increased mortality risk. Estimation of chronological age, and deviations from expected results, from structural MRI data has become an important task for develo** biomarkers that are sensitive to such deviations. Complementary to structural analysis,… ▽ More

    Submitted 21 January, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: SPIE Medical Imaging: Image Processing. San Diego, CA. February 2024 (accepted as poster presentation)

  42. arXiv:2310.09726  [pdf, other

    cs.GR cs.CV

    FuseSR: Super Resolution for Real-time Rendering through Efficient Multi-resolution Fusion

    Authors: Zhihua Zhong, **gsen Zhu, Yuxin Dai, Chuankun Zheng, Yuchi Huo, Guanlin Chen, Hujun Bao, Rui Wang

    Abstract: The workload of real-time rendering is steeply increasing as the demand for high resolution, high refresh rates, and high realism rises, overwhelming most graphics cards. To mitigate this problem, one of the most popular solutions is to render images at a low resolution to reduce rendering overhead, and then manage to accurately upsample the low-resolution rendered image to the target resolution,… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Accepted by SIGGRAPH Asia 2023. Project page: https://isaac-paradox.github.io/FuseSR/

  43. arXiv:2310.08067  [pdf, other

    cs.AI

    GameGPT: Multi-agent Collaborative Framework for Game Development

    Authors: Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang

    Abstract: The large language model (LLM) based agents have demonstrated their capacity to automate and expedite software development processes. In this paper, we focus on game development and propose a multi-agent collaborative framework, dubbed GameGPT, to automate game development. While many studies have pinpointed hallucination as a primary roadblock for deploying LLMs in production, we identify another… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  44. arXiv:2310.06486  [pdf, other

    cs.AI cs.CV cs.IR

    Topological RANSAC for instance verification and retrieval without fine-tuning

    Authors: Guoyuan An, Juhyung Seon, Inkyu An, Yuchi Huo, Sung-Eui Yoon

    Abstract: This paper presents an innovative approach to enhancing explainable image retrieval, particularly in situations where a fine-tuning set is unavailable. The widely-used SPatial verification (SP) method, despite its efficacy, relies on a spatial model and the hypothesis-testing strategy for instance recognition, leading to inherent limitations, including the assumption of planar structures and negle… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  45. arXiv:2310.01796  [pdf, other

    cs.SE

    LILAC: Log Parsing using LLMs with Adaptive Parsing Cache

    Authors: Zhihan Jiang, **yang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Gu, Michael R. Lyu

    Abstract: Log parsing transforms log messages into structured formats, serving as the prerequisite step for various log analysis tasks. Although a variety of log parsing approaches have been proposed, their performance on complicated log data remains compromised due to the use of human-crafted rules or learning-based models with limited training data. The recent emergence of powerful large language models (… ▽ More

    Submitted 22 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024)

  46. arXiv:2310.00677  [pdf, other

    cs.SE

    A Roadmap towards Intelligent Operations for Reliable Cloud Computing Systems

    Authors: Yintong Huo, Cheryl Lee, **yang Liu, Tianyi Yang, Michael R. Lyu

    Abstract: The increasing complexity and usage of cloud systems have made it challenging for service providers to ensure reliability. This paper highlights two main challenges, namely internal and external factors, that affect the reliability of cloud microservices. Afterward, we discuss the data-driven approach that can resolve these challenges from four key aspects: ticket management, log management, multi… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by ICDM AIOPS workshop

  47. arXiv:2310.00199  [pdf, other

    cs.CV

    DeformUX-Net: Exploring a 3D Foundation Backbone for Medical Image Segmentation with Depthwise Deformable Convolution

    Authors: Ho Hin Lee, Quan Liu, Qi Yang, Xin Yu, Shunxing Bao, Yuankai Huo, Bennett A. Landman

    Abstract: The application of 3D ViTs to medical image segmentation has seen remarkable strides, somewhat overshadowing the budding advancements in Convolutional Neural Network (CNN)-based models. Large kernel depthwise convolution has emerged as a promising technique, showcasing capabilities akin to hierarchical transformers and facilitating an expansive effective receptive field (ERF) vital for dense predi… ▽ More

    Submitted 3 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: 14 pages, the source code with our pre-trained model is available at this https://github.com/MASILab/deform-uxnet

  48. arXiv:2309.09392  [pdf, other

    eess.IV cs.CV

    Deep conditional generative models for longitudinal single-slice abdominal computed tomography harmonization

    Authors: Xin Yu, Qi Yang, Yucheng Tang, Riqiang Gao, Shunxing Bao, Leon Y. Cai, Ho Hin Lee, Yuankai Huo, Ann Zenobia Moore, Luigi Ferrucci, Bennett A. Landman

    Abstract: Two-dimensional single-slice abdominal computed tomography (CT) provides a detailed tissue map with high resolution allowing quantitative characterization of relationships between health conditions and aging. However, longitudinal analysis of body composition changes using these scans is difficult due to positional variation between slices acquired in different years, which leading to different or… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  49. arXiv:2309.07136  [pdf, other

    eess.SP cs.AI cs.LG stat.AP

    Masked Transformer for Electrocardiogram Classification

    Authors: Ya Zhou, Xiaolin Diao, Yanni Huo, Yang Liu, Xiaohan Fan, Wei Zhao

    Abstract: Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformer for ECG data has not been fully realized, despite their widespread success in computer vision and natural language processing. In this work, we present Masked Trans… ▽ More

    Submitted 22 April, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

    Comments: more experimental results; more implementation details; different abstracts

  50. arXiv:2309.05438  [pdf, other

    cs.CV cs.IR

    Towards Content-based Pixel Retrieval in Revisited Oxford and Paris

    Authors: Guoyuan An, Woo Jae Kim, Saelyne Yang, Rong Li, Yuchi Huo, Sung-Eui Yoon

    Abstract: This paper introduces the first two pixel retrieval benchmarks. Pixel retrieval is segmented instance retrieval. Like semantic segmentation extends classification to the pixel level, pixel retrieval is an extension of image retrieval and offers information about which pixels are related to the query object. In addition to retrieving images for the given query, it helps users quickly identify the q… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.