Skip to main content

Showing 1–50 of 77 results for author: Yi, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18159  [pdf, other

    cs.CV cs.GR

    Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

    Authors: Xiaolin Hong, Hongwei Yi, Fazhi He, Qiong Cao

    Abstract: Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlap** object generation in the same space. To address this limit… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.08638  [pdf, other

    cs.LG

    Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data

    Authors: Chi-Jane Chen, Haidong Yi, Natalie Stanley

    Abstract: Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per cell. In order to translate data from immune profiling assays into powerful diagnostics, machine learning approaches are used to compute per-sample immunological summaries, or featurizations that can be used as inputs to models for outcomes of interest. Cur… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. arXiv:2406.07078  [pdf, other

    cs.CV cs.AI

    Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

    Authors: Huahui Yi, Xiaofei Wang, Kang Li, Chao Li

    Abstract: Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a h… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.05431  [pdf

    cs.CL

    MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

    Authors: Gyeong Hoon Yi, Jiwoo Choi, Hyeongyun Song, Olivia Miano, Jaewoong Choi, Kihoon Bang, Byungju Lee, Seok Su Sohn, David Buttler, Anna Hiszpanski, Sang Soo Han, Donghun Kim

    Abstract: Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  5. arXiv:2405.18897  [pdf, other

    cs.CV

    MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning

    Authors: Junjie Wang, Guang**g Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Qicheng Lao

    Abstract: In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely in… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Tech report

  6. arXiv:2405.17141  [pdf, other

    eess.IV cs.CV

    MVMS-RCN: A Dual-Domain Unfolding CT Reconstruction with Multi-sparse-view and Multi-scale Refinement-correction

    Authors: Xiaohong Fan, Ke Chen, Huaming Yi, Yin Yang, Jian** Zhang

    Abstract: X-ray Computed Tomography (CT) is one of the most important diagnostic imaging techniques in clinical applications. Sparse-view CT imaging reduces the number of projection views to a lower radiation dose and alleviates the potential risk of radiation exposure. Most existing deep learning (DL) and deep unfolding sparse-view CT reconstruction methods: 1) do not fully use the projection data; 2) do n… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 12 pages, submitted

  7. arXiv:2405.00156  [pdf, other

    cs.CV cs.AI cs.LG quant-ph

    Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification

    Authors: Skylar Chan, Pranav Kulkarni, Paul H. Yi, Vishwa S. Parekh

    Abstract: Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small da… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 11 pages, 13 figures, 3 tables

  8. arXiv:2404.10685  [pdf, other

    cs.CV cs.GR

    Generating Human Interaction Motions in Scenes with Text Control

    Authors: Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe

    Abstract: We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model,… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Project Page: https://research.nvidia.com/labs/toronto-ai/tesmo/

  9. arXiv:2404.10209  [pdf, other

    cs.AI cs.LG

    Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

    Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhi** Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  10. arXiv:2404.07374  [pdf, other

    eess.IV cs.CV cs.LG

    Improving Multi-Center Generalizability of GAN-Based Fat Suppression using Federated Learning

    Authors: Pranav Kulkarni, Adway Kanhere, Harshita Kukreja, Vivian Zhang, Paul H. Yi, Vishwa S. Parekh

    Abstract: Generative Adversarial Network (GAN)-based synthesis of fat suppressed (FS) MRIs from non-FS proton density sequences has the potential to accelerate acquisition of knee MRIs. However, GANs trained on single-site data have poor generalizability to external data. We show that federated learning can improve multi-center generalizability of GANs for synthesizing FS MRIs, while facilitating privacy-pr… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures

  11. arXiv:2403.16554  [pdf, other

    cs.CL cs.AI

    PE: A Poincare Explanation Method for Fast Text Hierarchy Generation

    Authors: Qian Chen, Dongyang Li, Xiaofeng He, Hongzhao Li, Hongyu Yi

    Abstract: The black-box nature of deep learning models in NLP hinders their widespread application. The research focus has shifted to Hierarchical Attribution (HA) for its ability to model feature interactions. Recent works model non-contiguous combinations with a time-costly greedy search in Eculidean spaces, neglecting underlying linguistic information in feature representations. In this work, we introduc… ▽ More

    Submitted 12 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  12. arXiv:2403.15218  [pdf, other

    cs.CV cs.AI cs.LG

    Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations

    Authors: Pranav Kulkarni, Adway Kanhere, Dharmam Savani, Andrew Chan, Devina Chatterjee, Paul H. Yi, Vishwa S. Parekh

    Abstract: Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, i… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  13. arXiv:2403.10021  [pdf, other

    cs.CR

    Time-Frequency Jointed Imperceptible Adversarial Attack to Brainprint Recognition with Deep Learning Models

    Authors: Hangjie Yi, Yuhang Ming, Dongjun Liu, Wanzeng Kong

    Abstract: EEG-based brainprint recognition with deep learning models has garnered much attention in biometric identification. Yet, studies have indicated vulnerability to adversarial attacks in deep learning models with EEG inputs. In this paper, we introduce a novel adversarial attack method that jointly attacks time-domain and frequency-domain EEG signals by employing wavelet transform. Different from mos… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: This work is accepted by ICME 2024

  14. arXiv:2402.11809  [pdf, other

    cs.CL cs.AI cs.LG

    Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

    Authors: Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao

    Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables… ▽ More

    Submitted 19 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings

  15. arXiv:2402.08088  [pdf, other

    cs.AI cs.LG eess.IV

    Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control

    Authors: Ghada Zamzmi, Kesavan Venkatesh, Brandon Nelson, Smriti Prathapan, Paul H. Yi, Berkman Sahiner, Jana G. Delfino

    Abstract: Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety. Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift mon… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  16. arXiv:2402.05713  [pdf, other

    cs.LG cs.AI cs.CV

    Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations

    Authors: Pranav Kulkarni, Andrew Chan, Nithya Navarathna, Skylar Chan, Paul H. Yi, Vishwa S. Parekh

    Abstract: The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an u… ▽ More

    Submitted 7 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 29 pages, 4 figures

  17. arXiv:2401.12522  [pdf, other

    cs.CL cs.AI cs.LG

    BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

    Authors: Feng Lin, Hanling Yi, Hongbin Li, Yifan Yang, Xiaotian Yu, Guangming Lu, Rong Xiao

    Abstract: Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of pro… ▽ More

    Submitted 25 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: An appendix has been included. Source code at https://github.com/linfeng93/BiTA

  18. arXiv:2312.17449  [pdf, other

    cs.DB

    DB-GPT: Empowering Database Interactions with Private Large Language Models

    Authors: Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhi** Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user… ▽ More

    Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  19. arXiv:2312.07981  [pdf

    cs.LG cs.SD eess.SP

    Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation

    Authors: Haiming Yi, Lei Hou, Yuhong **, Nasser A. Saeed, Ali Kandil, Hao Duan

    Abstract: Diffusion models have demonstrated powerful data generation capabilities in various research fields such as image generation. However, in the field of vibration signal generation, the criteria for evaluating the quality of the generated signal are different from that of image generation and there is a fundamental difference between them. At present, there is no research on the ability of diffusion… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Journal ref: Mechanical Systems and Signal Processing, 2024, 216: 111481

  20. arXiv:2311.17074  [pdf, other

    cs.CV

    Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification

    Authors: Siyuan Huang, Yifan Zhou, Ram Prabhakar, Xijun Liu, Yuxiang Guo, Hongrui Yi, Cheng Peng, Rama Chellappa, Chun Pong Lau

    Abstract: Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-tr… ▽ More

    Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  21. arXiv:2309.15273  [pdf, other

    cs.CV

    DECO: Dense Estimation of 3D Human-Scene Contact In The Wild

    Authors: Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black

    Abstract: Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. I… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted as Oral in ICCV'23. Project page: https://deco.is.tue.mpg.de

  22. arXiv:2309.14600  [pdf, other

    cs.CV

    Progressive Text-to-3D Generation for Automatic 3D Prototy**

    Authors: Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-seng Chua

    Abstract: Text-to-3D generation is to craft a 3D object according to a natural language description. This can significantly reduce the workload for manually designing 3D models and provide a more natural way of interaction for users. However, this problem remains challenging in recovering the fine-grained details effectively and optimizing a large-size 3D output efficiently. Inspired by the success of progr… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  23. arXiv:2308.12965  [pdf, other

    cs.CV

    POCO: 3D Pose and Shape Estimation with Confidence

    Authors: Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, Dimitrios Tzionas

    Abstract: The regression of 3D Human Pose and Shape (HPS) from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the con… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  24. arXiv:2308.10899  [pdf, other

    cs.AI

    TADA! Text to Animatable Digital Avatars

    Authors: Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black

    Abstract: We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent a… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  25. arXiv:2308.08545  [pdf, other

    cs.CV cs.AI cs.GR

    TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

    Authors: Yangyi Huang, Hongwei Yi, Yuliang Xiu, Tingting Liao, Jiaxiang Tang, Deng Cai, Justus Thies

    Abstract: Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are su… ▽ More

    Submitted 19 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Project: https://huangyangyi.github.io/TeCH, Code: https://github.com/huangyangyi/TeCH

  26. arXiv:2307.01200  [pdf, other

    cs.CV

    ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning

    Authors: Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu, Jiajun Zhang, Hongwei Yi, Sheng** Zhang, Yebin Liu

    Abstract: Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motio… ▽ More

    Submitted 25 December, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Our project page is https://zhangyux15.github.io/ProxyCapV2

  27. arXiv:2307.00438  [pdf, other

    cs.CV cs.IR cs.LG

    One Copy Is All You Need: Resource-Efficient Streaming of Medical Imaging Data at Scale

    Authors: Pranav Kulkarni, Adway Kanhere, Eliot Siegel, Paul H. Yi, Vishwa S. Parekh

    Abstract: Large-scale medical imaging datasets have accelerated development of artificial intelligence tools for clinical decision support. However, the large size of these datasets is a bottleneck for users with limited storage and bandwidth. Many users may not even require such large datasets as AI models are often trained on lower resolution images. If users could directly download at their desired resol… ▽ More

    Submitted 1 July, 2023; originally announced July 2023.

    Comments: 13 pages, 4 figures, 2 tables

  28. arXiv:2306.16736  [pdf, other

    cs.CV cs.AI

    GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

    Authors: Sihan Ma, Qiong Cao, Hongwei Yi, **g Zhang, Dacheng Tao

    Abstract: Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast,… ▽ More

    Submitted 16 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: Accepted to ACM Multimedia 2023. The code will be available at https://github.com/xymsh/GraMMaR

  29. arXiv:2305.18993  [pdf, other

    cs.CV

    ConES: Concept Embedding Search for Parameter Efficient Tuning Large Vision Language Models

    Authors: Huahui Yi, Ziyuan Qin, Wei Xu, Miaotian Guo, Kun Wang, Shaoting Zhang, Kang Li, Qicheng Lao

    Abstract: Large pre-trained vision-language models have shown great prominence in transferring pre-acquired knowledge to various domains and downstream tasks with appropriate prompting or tuning. Existing prevalent tuning methods can be generally categorized into three genres: 1) prompt engineering by creating suitable prompt texts, which is time-consuming and requires domain expertise; 2) or simply fine-tu… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  30. arXiv:2305.15617  [pdf, other

    eess.IV cs.CV cs.LG

    ISLE: An Intelligent Streaming Framework for High-Throughput AI Inference in Medical Imaging

    Authors: Pranav Kulkarni, Sean Garin, Adway Kanhere, Eliot Siegel, Paul H. Yi, Vishwa S. Parekh

    Abstract: As the adoption of Artificial Intelligence (AI) systems within the clinical environment grows, limitations in bandwidth and compute can create communication bottlenecks when streaming imaging data, leading to delays in patient care and increased cost. As such, healthcare providers and AI vendors will require greater computational infrastructure, therefore dramatically increasing costs. To that end… ▽ More

    Submitted 25 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, 3 tables

  31. arXiv:2305.07637  [pdf, other

    cs.LG cs.CL cs.HC cs.IR

    Text2Cohort: Facilitating Intuitive Access to Biomedical Data with Natural Language Cohort Discovery

    Authors: Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

    Abstract: The Imaging Data Commons (IDC) is a cloud-based database that provides researchers with open access to cancer imaging data, with the goal of facilitating collaboration. However, cohort discovery within the IDC database has a significant technical learning curve. Recently, large language models (LLM) have demonstrated exceptional utility for natural language processing tasks. We developed Text2Coho… ▽ More

    Submitted 25 November, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: 5 pages, 3 figures, 2 tables

  32. arXiv:2304.03903  [pdf, other

    cs.CV cs.AI

    High-Fidelity Clothed Avatar Reconstruction from a Single Image

    Authors: Tingting Liao, Xiaomei Zhang, Yuliang Xiu, Hongwei Yi, Xudong Liu, Guo-Jun Qi, Yong Zhang, Xuan Wang, Xiangyu Zhu, Zhen Lei

    Abstract: This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  33. arXiv:2303.09095  [pdf, other

    cs.CV

    SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

    Authors: Yudi Dai, Yitai Lin, Xi** Lin, Chenglu Wen, Lan Xu, Hongwei Yi, Siqi Shen, Yuexin Ma, Cheng Wang

    Abstract: We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3… ▽ More

    Submitted 18 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: 11 pages,7 figures, CVPR2023

  34. arXiv:2303.06580  [pdf, other

    cs.CV cs.CL cs.LG

    Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

    Authors: Huahui Yi, Ziyuan Qin, Qicheng Lao, Wei Xu, Zekun Jiang, Dequan Wang, Shaoting Zhang, Kang Li

    Abstract: Inevitable domain and task discrepancies in real-world scenarios can impair the generalization performance of the pre-trained deep models for medical data. Therefore, we audaciously propose that we should build a general-purpose medical AI system that can be seamlessly adapted to downstream domains/tasks. Since the domain/task adaption procedures usually involve additional labeling work for the ta… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

  35. arXiv:2303.06180  [pdf, other

    cs.LG cs.AI cs.CV

    Optimizing Federated Learning for Medical Image Classification on Distributed Non-iid Datasets with Partial Labels

    Authors: Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

    Abstract: Numerous large-scale chest x-ray datasets have spearheaded expert-level detection of abnormalities using deep learning. However, these datasets focus on detecting a subset of disease labels that could be present, thus making them distributed and non-iid with partial labels. Recent literature has indicated the impact of batch normalization layers on the convergence of federated learning due to doma… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 10 pages, 1 algorithm, 4 tables

  36. arXiv:2302.14817  [pdf, ps, other

    cs.NI

    A Cooperative Content Dissemination Framework for Fog-Based Internet of Vehicles

    Authors: Weihua Wu, Peng Wang, Yuan Zhang, Weijia Han, He Yi, Tony Q. S. Quek

    Abstract: As the fog-based internet of vehicles (IoV) is equipped with rich perception, computing, communication and storage resources, it provides a new solution for the bulk data processing. However, the impact caused by the mobility of vehicles brings a challenge to the content scheduling and resource allocation of content dissemination service. In this paper, we propose a time-varying resource relations… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  37. arXiv:2301.07361  [pdf, other

    math.NA cs.LG

    Dirichlet-Neumann learning algorithm for solving elliptic interface problems

    Authors: Qi Sun, Xuejun Xu, Haotian Yi

    Abstract: Non-overlap** domain decomposition methods are natural for solving interface problems arising from various disciplines, however, the numerical simulation requires technical analysis and is often available only with the use of high-quality grids, thereby impeding their use in more complicated situations. To remove the burden of mesh generation and to effectively tackle with the interface jump con… ▽ More

    Submitted 17 May, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    MSC Class: 65M55; 65Nxx; 92B20; 49S05

  38. arXiv:2301.07074  [pdf, other

    cs.CV cs.AI cs.LG

    SegViz: A federated-learning based framework for multi-organ segmentation on heterogeneous data sets with partial annotations

    Authors: Adway U. Kanhere, Pranav Kulkarni, Paul H. Yi, Vishwa S. Parekh

    Abstract: Segmentation is one of the most primary tasks in deep learning for medical imaging, owing to its multiple downstream clinical applications. However, generating manual annotations for medical images is time-consuming, requires high skill, and is an expensive effort, especially for 3D images. One potential solution is to aggregate knowledge from partially annotated datasets from multiple groups to c… ▽ More

    Submitted 13 March, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

  39. arXiv:2301.06683  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Surgical Aggregation: Federated Class-Heterogeneous Learning

    Authors: Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

    Abstract: The release of numerous chest x-ray datasets has spearheaded the development of deep learning models with expert-level performance. However, they have limited interoperability due to class-heterogeneity -- a result of inconsistent labeling schemes and partial annotations. Therefore, it is challenging to leverage these datasets in aggregate to train models with a complete representation of abnormal… ▽ More

    Submitted 5 January, 2024; v1 submitted 16 January, 2023; originally announced January 2023.

    Comments: 9 pages, 7 figures, 4 tables

  40. arXiv:2212.04420  [pdf, other

    cs.CV cs.GR

    Generating Holistic 3D Human Motion from Speech

    Authors: Hongwei Yi, Hualin Liang, Yifei Liu, Qiong Cao, Yandong Wen, Timo Bolkart, Dacheng Tao, Michael J. Black

    Abstract: This work addresses the problem of generating 3D holistic body motions from human speech. Given a speech recording, we synthesize sequences of 3D body poses, hand gestures, and facial expressions that are realistic and diverse. To achieve this, we first build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in w… ▽ More

    Submitted 17 June, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: Project Webpage: https://talkshow.is.tue.mpg.de; CVPR2023

  41. arXiv:2212.04360  [pdf, other

    cs.CV cs.GR

    MIME: Human-Aware 3D Scene Generation

    Authors: Hongwei Yi, Chun-Hao P. Huang, Shashank Tripathi, Lea Hering, Justus Thies, Michael J. Black

    Abstract: Generating realistic 3D worlds occupied by moving humans has many applications in games, architecture, and synthetic data creation. But generating such scenes is expensive and labor intensive. Recent work generates human poses and motions given a 3D scene. Here, we take the opposite approach and generate 3D indoor scenes given 3D human motion. Such motions can come from archival motion capture or… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Comments: Project Page: https://mime.is.tue.mpg.de

  42. arXiv:2212.02469  [pdf, other

    cs.CV cs.AI cs.GR

    One-shot Implicit Animatable Avatars with Model-based Priors

    Authors: Yangyi Huang, Hongwei Yi, Weiyang Liu, Haofan Wang, Boxi Wu, Wenxiao Wang, Binbin Lin, Debing Zhang, Deng Cai

    Abstract: Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the dat… ▽ More

    Submitted 27 September, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: To appear at ICCV 2023. Project website: https://huangyangyi.github.io/ELICIT/

  43. arXiv:2211.15924  [pdf, other

    cs.CV

    Weakly Supervised Learning Significantly Reduces the Number of Labels Required for Intracranial Hemorrhage Detection on Head CT

    Authors: Jacopo Teneggi, Paul H. Yi, Jeremias Sulam

    Abstract: Modern machine learning pipelines, in particular those based on deep learning (DL) models, require large amounts of labeled data. For classification problems, the most common learning paradigm consists of presenting labeled examples during training, thus providing strong supervision on what constitutes positive and negative samples. This constitutes a major obstacle for the development of DL model… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  44. arXiv:2211.12942  [pdf

    cs.IT eess.SP

    Double criterion-based estimator for signal number estimation for the colored noise with unknown covariance matrix

    Authors: Huiyue Yi, Wuxiong Zhang, Hui Xu

    Abstract: The subspace-based techniques are widely utilized to estimate the parameters of sums of complex sinusoids corrupted by noise, and the zoom ESPRIT algorithm utilizes the zoom technique to apply the ESPRIT to a narrow frequency band to improve the accuracy of frequency estimation. However, the Gaussian noise becomes non-Gaussian in the zoomed baseband after being filtered by a low-pass filter, and t… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 23 pages, 5 figures

  45. arXiv:2211.06212  [pdf, other

    eess.IV cs.CV cs.LG

    From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning

    Authors: Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

    Abstract: Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient. In this work, we demonstrate how federated learning (FL) can be used to make thes… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: Accepted paper for Medical Imaging meet NeurIPS (MedNeurIPS) Workshop 2022

  46. arXiv:2211.04734  [pdf, other

    cs.LG cs.CR q-bio.QM

    Framework Construction of an Adversarial Federated Transfer Learning Classifier

    Authors: Hang Yi, Tongxuan Bie, Tongjiang Yan

    Abstract: As the Internet grows in popularity, more and more classification jobs, such as IoT, finance industry and healthcare field, rely on mobile edge computing to advance machine learning. In the medical industry, however, good diagnostic accuracy necessitates the combination of large amounts of labeled data to train the model, which is difficult and expensive to collect and risks jeopardizing patients'… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  47. arXiv:2209.15517  [pdf, other

    cs.CV

    Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study

    Authors: Ziyuan Qin, Huahui Yi, Qicheng Lao, Kang Li

    Abstract: The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capability on natural images. However, it remains unknown whether this capability can also apply to the medical image domain. This paper thoroughly studies the knowledge transferability of pre-trained VLMs to the medical domain, where we show that well-designed medical prompts are the key to elicit knowl… ▽ More

    Submitted 7 February, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Accepted to ICLR2023

  48. NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

    Authors: Jiankai Sun, Yan Xu, Mingyu Ding, Hongwei Yi, Chen Wang, **gdong Wang, Liangjun Zhang, Mac Schwager

    Abstract: Neural Radiance Fields (NeRFs) have become a widely-applied scene representation technique in recent years, showing advantages for robot navigation and manipulation tasks. To further advance the utility of NeRFs for robotics, we propose a transformer-based framework, NeRF-Loc, to extract 3D bounding boxes of objects in NeRF scenes. NeRF-Loc takes a pre-trained NeRF model and camera view as input a… ▽ More

    Submitted 15 July, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

    Journal ref: IEEE Robotics and Automation Letters ( Volume: 8, Issue: 8, August 2023)

  49. arXiv:2208.11544  [pdf

    cs.IT eess.SP

    Iterative Adaptively Regularized LASSO-ADMM Algorithm for CFAR Estimation of Sparse Signals: IAR-LASSO-ADMM-CFAR Algorithm

    Authors: Huiyue Yi, Yan Xu, Wuxiong Zhang, Hui Xu

    Abstract: The least-absolute shrinkage and selection operator (LASSO) is a regularization technique for estimating sparse signals of interest emerging in various applications and can be efficiently solved via the alternating direction method of multipliers (ADMM), which will be termed as LASSO-ADMM algorithm. The choice of the regularization parameter has significant impact on the performance of LASSO-ADMM… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: 8 pages, 2 figures,

  50. arXiv:2206.09553  [pdf, other

    cs.CV

    Capturing and Inferring Dense Full-Body Human-Scene Contact

    Authors: Chun-Hao P. Huang, Hongwei Yi, Markus Höschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, Michael J. Black

    Abstract: Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of prede… ▽ More

    Submitted 19 June, 2022; originally announced June 2022.

    Comments: CVPR 2022