Search | arXiv e-print repository

Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models

Authors: Xiaolin Hong, Hongwei Yi, Fazhi He, Qiong Cao

Abstract: Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlap** object generation in the same space. To address this limit… ▽ More Generating 3D scenes from human motion sequences supports numerous applications, including virtual reality and architectural design. However, previous auto-regression-based human-aware 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans, often resulting in overlap** object generation in the same space. To address this limitation, we explore the potential of diffusion models that simultaneously consider all input humans and the floor plan to generate plausible 3D scenes. Our approach not only satisfies all input human interactions but also adheres to spatial constraints with the floor plan. Furthermore, we introduce two spatial collision guidance mechanisms: human-object collision avoidance and object-room boundary constraints. These mechanisms help avoid generating scenes that conflict with human motions while respecting layout constraints. To enhance the diversity and accuracy of human-guided scene generation, we have developed an automated pipeline that improves the variety and plausibility of human-object interactions in the existing 3D FRONT HUMAN dataset. Extensive experiments on both synthetic and real-world datasets demonstrate that our framework can generate more natural and plausible 3D scenes with precise human-scene interactions, while significantly reducing human-object collisions compared to previous state-of-the-art methods. Our code and data will be made publicly available upon publication of this work. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.08638 [pdf, other]

Conditional Similarity Triplets Enable Covariate-Informed Representations of Single-Cell Data

Authors: Chi-Jane Chen, Haidong Yi, Natalie Stanley

Abstract: Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per cell. In order to translate data from immune profiling assays into powerful diagnostics, machine learning approaches are used to compute per-sample immunological summaries, or featurizations that can be used as inputs to models for outcomes of interest. Cur… ▽ More Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per cell. In order to translate data from immune profiling assays into powerful diagnostics, machine learning approaches are used to compute per-sample immunological summaries, or featurizations that can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are optimized based only on the outcome variable to be predicted and do not take into account clinically-relevant covariates that are likely to also be measured. Here we expand the optimization problem to also take into account such additional patient covariates to directly inform the learned per-sample representations. To do this, we introduce CytoCoSet, a set-based encoding method, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations. Overall, incorporating clinical covariates leads to improved prediction of clinical phenotypes. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07078 [pdf, other]

Unified Modeling Enhanced Multimodal Learning for Precision Neuro-Oncology

Authors: Huahui Yi, Xiaofei Wang, Kang Li, Chao Li

Abstract: Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a h… ▽ More Multimodal learning, integrating histology images and genomics, promises to enhance precision oncology with comprehensive views at microscopic and molecular levels. However, existing methods may not sufficiently model the shared or complementary information for more effective integration. In this study, we introduce a Unified Modeling Enhanced Multimodal Learning (UMEML) framework that employs a hierarchical attention structure to effectively leverage shared and complementary features of both modalities of histology and genomics. Specifically, to mitigate unimodal bias from modality imbalance, we utilize a query-based cross-attention mechanism for prototype clustering in the pathology encoder. Our prototype assignment and modularity strategy are designed to align shared features and minimizes modality gaps. An additional registration mechanism with learnable tokens is introduced to enhance cross-modal feature integration and robustness in multimodal unified modeling. Our experiments demonstrate that our method surpasses previous state-of-the-art approaches in glioma diagnosis and prognosis tasks, underscoring its superiority in precision neuro-Oncology. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06112 [pdf]

Resilient Growth of Highly Crystalline Topological Insulator-Superconductor Heterostructure Enabled by Ex-situ Nitride Film

Authors: Renjie Xie, Min Ge, Shaozhu Xiao, Jiahui Zhang, Jiachang Bi, Xiaoyu Yuan, Hee Taek Yi, Baomin Wang, Seongshik Oh, Yanwei Cao, Xiong Yao

Abstract: Highly crystalline and easily feasible topological insulator-superconductor (TI-SC) heterostructures are crucial for the development of practical topological qubit devices. The optimal superconducting layer for TI-SC heterostructures should be highly resilient against external contaminations and structurally compatible with TIs. In this study, we provide a solution to this challenge by showcasing… ▽ More Highly crystalline and easily feasible topological insulator-superconductor (TI-SC) heterostructures are crucial for the development of practical topological qubit devices. The optimal superconducting layer for TI-SC heterostructures should be highly resilient against external contaminations and structurally compatible with TIs. In this study, we provide a solution to this challenge by showcasing the growth of a highly crystalline TI-SC heterostructure using refractory TiN (111) as the superconducting layer. This approach can eliminate the need for in-situ cleaving or growth. More importantly, the TiN surface shows high resilience against contaminations during air exposure, as demonstrated by the successful recyclable growth of Bi2Se3. Our findings indicate that TI-SC heterostructures based on nitride films are compatible with device fabrication techniques, paving a path to the realization of practical topological qubit devices in the future. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 22 pages, 4 figures, accepted by ACS Applied Materials & Interfaces

arXiv:2406.05431 [pdf]

MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

Authors: Gyeong Hoon Yi, Jiwoo Choi, Hyeongyun Song, Olivia Miano, Jaewoong Choi, Kihoon Bang, Byungju Lee, Seok Su Sohn, David Buttler, Anna Hiszpanski, Sang Soo Han, Donghun Kim

Abstract: Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl… ▽ More Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTableGPT features key strategies of table data representation and table splitting for better GPT comprehension and filtering hallucinated information through follow-up questions. When applied to a vast volume of water splitting catalysis literature, MaTableGPT achieved an extraction accuracy (total F1 score) of up to 96.8%. Through comprehensive evaluations of the GPT usage cost, labeling cost, and extraction accuracy for the learning methods of zero-shot, few-shot and fine-tuning, we present a Pareto-front map** where the few-shot learning method was found to be the most balanced solution owing to both its high extraction accuracy (total F1 score>95%) and low cost (GPT usage cost of 5.97 US dollars and labeling cost of 10 I/O paired examples). The statistical analyses conducted on the database generated by MaTableGPT revealed valuable insights into the distribution of the overpotential and elemental utilization across the reported catalysts in the water splitting literature. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.18897 [pdf, other]

MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning

Authors: Junjie Wang, Guang**g Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Qicheng Lao

Abstract: In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely in… ▽ More In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or ``experts'', thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model while barely increasing the parameter count. Remarkably, MLAE achieves new SOTA performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, demonstrating superior performance. Our code is available at https://github.com/jie040109/MLAE. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Tech report

arXiv:2405.17141 [pdf, other]

MVMS-RCN: A Dual-Domain Unfolding CT Reconstruction with Multi-sparse-view and Multi-scale Refinement-correction

Authors: Xiaohong Fan, Ke Chen, Huaming Yi, Yin Yang, Jian** Zhang

Abstract: X-ray Computed Tomography (CT) is one of the most important diagnostic imaging techniques in clinical applications. Sparse-view CT imaging reduces the number of projection views to a lower radiation dose and alleviates the potential risk of radiation exposure. Most existing deep learning (DL) and deep unfolding sparse-view CT reconstruction methods: 1) do not fully use the projection data; 2) do n… ▽ More X-ray Computed Tomography (CT) is one of the most important diagnostic imaging techniques in clinical applications. Sparse-view CT imaging reduces the number of projection views to a lower radiation dose and alleviates the potential risk of radiation exposure. Most existing deep learning (DL) and deep unfolding sparse-view CT reconstruction methods: 1) do not fully use the projection data; 2) do not always link their architecture designs to a mathematical theory; 3) do not flexibly deal with multi-sparse-view reconstruction assignments. This paper aims to use mathematical ideas and design optimal DL imaging algorithms for sparse-view tomography reconstructions. We propose a novel dual-domain deep unfolding unified framework that offers a great deal of flexibility for multi-sparse-view CT reconstruction with different sampling views through a single model. This framework combines the theoretical advantages of model-based methods with the superior reconstruction performance of DL-based methods, resulting in the expected generalizability of DL. We propose a refinement module that utilizes unfolding projection domain to refine full-sparse-view projection errors, as well as an image domain correction module that distills multi-scale geometric error corrections to reconstruct sparse-view CT. This provides us with a new way to explore the potential of projection information and a new perspective on designing network architectures. All parameters of our proposed framework are learnable end to end, and our method possesses the potential to be applied to plug-and-play reconstruction. Extensive experiments demonstrate that our framework is superior to other existing state-of-the-art methods. Our source codes are available at https://github.com/fanxiaohong/MVMS-RCN. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 12 pages, submitted

arXiv:2405.16109 [pdf, other]

Gain-loss-engineering: a new platform for extreme anisotropic thermal photon tunneling

Authors: Cheng-Long Zhou, Yu-Chen Peng, Yong Zhang, Hong-Liang Yi, Mauro Antezza, Vincenzo Galdi

Abstract: We explore a novel approach to achieving anisotropic thermal photon tunneling, inspired by the concept of parity-time symmetry in quantum physics. Our method leverages the modulation of constitutive optical parameters, oscillating between loss and gain regimes. This modulation reveals a variety of distinct effects in thermal photon behavior and dispersion. Specifically, we identify complex tunneli… ▽ More We explore a novel approach to achieving anisotropic thermal photon tunneling, inspired by the concept of parity-time symmetry in quantum physics. Our method leverages the modulation of constitutive optical parameters, oscillating between loss and gain regimes. This modulation reveals a variety of distinct effects in thermal photon behavior and dispersion. Specifically, we identify complex tunneling modes through gain-loss engineering, which include thermal photonic defect states and Fermi-arc-like phenomena, which surpass those achievable through traditional polariton engineering. Our research also elucidates the laws governing the evolution of radiative energy in the presence of gain and loss interactions, and highlights the unexpected inefficacy of gain in enhancing thermal photon energy transport compared to systems characterized solely by loss. This study not only broadens our understanding of thermal photon tunneling but also establishes a versatile platform for manipulating photon energy transport, with potential applications in thermal management, heat science, and the development of advanced energy devices. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 8 pages, 5 figures

arXiv:2405.00156 [pdf, other]

Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification

Authors: Skylar Chan, Pranav Kulkarni, Paul H. Yi, Vishwa S. Parekh

Abstract: Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small da… ▽ More Quantum machine learning (QML) has the potential for improving the multi-label classification of rare, albeit critical, diseases in large-scale chest x-ray (CXR) datasets due to theoretical quantum advantages over classical machine learning (CML) in sample efficiency and generalizability. While prior literature has explored QML with CXRs, it has focused on binary classification tasks with small datasets due to limited access to quantum hardware and computationally expensive simulations. To that end, we implemented a Jax-based framework that enables the simulation of medium-sized qubit architectures with significant improvements in wall-clock time over current software offerings. We evaluated the performance of our Jax-based framework in terms of efficiency and performance for hybrid quantum transfer learning for long-tailed classification across 8, 14, and 19 disease labels using large-scale CXR datasets. The Jax-based framework resulted in up to a 58% and 95% speed-up compared to PyTorch and TensorFlow implementations, respectively. However, compared to CML, QML demonstrated slower convergence and an average AUROC of 0.70, 0.73, and 0.74 for the classification of 8, 14, and 19 CXR disease labels. In comparison, the CML models had an average AUROC of 0.77, 0.78, and 0.80 respectively. In conclusion, our work presents an accessible implementation of hybrid quantum transfer learning for long-tailed CXR classification with a computationally efficient Jax-based framework. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 11 pages, 13 figures, 3 tables

arXiv:2404.10685 [pdf, other]

Generating Human Interaction Motions in Scenes with Text Control

Authors: Hongwei Yi, Justus Thies, Michael J. Black, Xue Bin Peng, Davis Rempe

Abstract: We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model,… ▽ More We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Previous text-to-motion methods focus on characters in isolation without considering scenes due to the limited availability of datasets that include motion, text descriptions, and interactive scenes. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model, emphasizing goal-reaching constraints on large-scale motion-capture datasets. We then enhance this model with a scene-aware component, fine-tuned using data augmented with detailed scene information, including ground plane and object shapes. To facilitate training, we embed annotated navigation and interaction motions within scenes. The proposed method produces realistic and diverse human-object interactions, such as navigation and sitting, in different scenes with various object shapes, orientations, initial body positions, and poses. Extensive experiments demonstrate that our approach surpasses prior techniques in terms of the plausibility of human-scene interactions, as well as the realism and variety of the generated motions. Code will be released upon publication of this work at https://research.nvidia.com/labs/toronto-ai/tesmo. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Project Page: https://research.nvidia.com/labs/toronto-ai/tesmo/

arXiv:2404.10209 [pdf, other]

Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhi** Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interaction tasks to enhance user experience and accessibility. DB-GPT is designed to understand data interaction tasks described by natural language and provide context-aware responses powered by LLMs, making it an indispensable tool for users ranging from novice to expert. Its system design supports deployment across local, distributed, and cloud environments. Beyond handling basic data interaction tasks like Text-to-SQL with LLMs, it can handle complex tasks like generative data analysis through a Multi-Agents framework and the Agentic Workflow Expression Language (AWEL). The Service-oriented Multi-model Management Framework (SMMF) ensures data privacy and security, enabling users to employ DB-GPT with private LLMs. Additionally, DB-GPT offers a series of product-ready features designed to enable users to integrate DB-GPT within their product environments easily. The code of DB-GPT is available at Github(https://github.com/eosphoros-ai/DB-GPT) which already has over 10.7k stars. Please install DB-GPT for your own usage with the instructions(https://github.com/eosphoros-ai/DB-GPT#install) and watch a 5-minute introduction video on Youtube(https://youtu.be/n_8RI1ENyl4) to further investigate DB-GPT. △ Less

Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.07374 [pdf, other]

Improving Multi-Center Generalizability of GAN-Based Fat Suppression using Federated Learning

Authors: Pranav Kulkarni, Adway Kanhere, Harshita Kukreja, Vivian Zhang, Paul H. Yi, Vishwa S. Parekh

Abstract: Generative Adversarial Network (GAN)-based synthesis of fat suppressed (FS) MRIs from non-FS proton density sequences has the potential to accelerate acquisition of knee MRIs. However, GANs trained on single-site data have poor generalizability to external data. We show that federated learning can improve multi-center generalizability of GANs for synthesizing FS MRIs, while facilitating privacy-pr… ▽ More Generative Adversarial Network (GAN)-based synthesis of fat suppressed (FS) MRIs from non-FS proton density sequences has the potential to accelerate acquisition of knee MRIs. However, GANs trained on single-site data have poor generalizability to external data. We show that federated learning can improve multi-center generalizability of GANs for synthesizing FS MRIs, while facilitating privacy-preserving multi-institutional collaborations. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 5 pages, 2 figures

arXiv:2403.16554 [pdf, other]

PE: A Poincare Explanation Method for Fast Text Hierarchy Generation

Authors: Qian Chen, Dongyang Li, Xiaofeng He, Hongzhao Li, Hongyu Yi

Abstract: The black-box nature of deep learning models in NLP hinders their widespread application. The research focus has shifted to Hierarchical Attribution (HA) for its ability to model feature interactions. Recent works model non-contiguous combinations with a time-costly greedy search in Eculidean spaces, neglecting underlying linguistic information in feature representations. In this work, we introduc… ▽ More The black-box nature of deep learning models in NLP hinders their widespread application. The research focus has shifted to Hierarchical Attribution (HA) for its ability to model feature interactions. Recent works model non-contiguous combinations with a time-costly greedy search in Eculidean spaces, neglecting underlying linguistic information in feature representations. In this work, we introduce a novel method, namely Poincare Explanation (PE), for modeling feature interactions with hyperbolic spaces in a time efficient manner. Specifically, we take building text hierarchies as finding spanning trees in hyperbolic spaces. First we project the embeddings into hyperbolic spaces to elicit inherit semantic and syntax hierarchical structures. Then we propose a simple yet effective strategy to calculate Shapley score. Finally we build the the hierarchy with proving the constructing process in the projected space could be viewed as building a minimum spanning tree and introduce a time efficient building algorithm. Experimental results demonstrate the effectiveness of our approach. △ Less

Submitted 12 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15218 [pdf, other]

Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations

Authors: Pranav Kulkarni, Adway Kanhere, Dharmam Savani, Andrew Chan, Devina Chatterjee, Paul H. Yi, Vishwa S. Parekh

Abstract: Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, i… ▽ More Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, including medical imaging, and hold a lot of promise for streamlining the annotation process. However, SAM has yet to be evaluated in a crowd-sourced setting to curate annotations for training 3D DL segmentation models. In this work, we explore the potential of SAM for crowd-sourcing "sparse" annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model. Our results indicate that while SAM-generated annotations exhibit high mean Dice scores compared to ground-truth annotations, nnU-Net models trained on SAM-generated annotations perform significantly worse than nnU-Net models trained on ground-truth annotations ($p<0.001$, all). △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.10021 [pdf, other]

Time-Frequency Jointed Imperceptible Adversarial Attack to Brainprint Recognition with Deep Learning Models

Authors: Hangjie Yi, Yuhang Ming, Dongjun Liu, Wanzeng Kong

Abstract: EEG-based brainprint recognition with deep learning models has garnered much attention in biometric identification. Yet, studies have indicated vulnerability to adversarial attacks in deep learning models with EEG inputs. In this paper, we introduce a novel adversarial attack method that jointly attacks time-domain and frequency-domain EEG signals by employing wavelet transform. Different from mos… ▽ More EEG-based brainprint recognition with deep learning models has garnered much attention in biometric identification. Yet, studies have indicated vulnerability to adversarial attacks in deep learning models with EEG inputs. In this paper, we introduce a novel adversarial attack method that jointly attacks time-domain and frequency-domain EEG signals by employing wavelet transform. Different from most existing methods which only target time-domain EEG signals, our method not only takes advantage of the time-domain attack's potent adversarial strength but also benefits from the imperceptibility inherent in frequency-domain attack, achieving a better balance between attack performance and imperceptibility. Extensive experiments are conducted in both white- and grey-box scenarios and the results demonstrate that our attack method achieves state-of-the-art attack performance on three datasets and three deep-learning models. In the meanwhile, the perturbations in the signals attacked by our method are barely perceptible to the human visual system. △ Less

Submitted 30 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: This work is accepted by ICME 2024

arXiv:2402.11809 [pdf, other]

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

Authors: Hanling Yi, Feng Lin, Hongbin Li, Peiyang Ning, Xiaotian Yu, Rong Xiao

Abstract: This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables… ▽ More This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose \textbf{S}mart \textbf{P}arallel \textbf{A}uto-\textbf{C}orrect d\textbf{E}coding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2.7x-4.0x on HumanEval-X while maintaining output quality. △ Less

Submitted 19 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: Accepted by ACL 2024 Findings

arXiv:2402.09208 [pdf]

Coexistence of Superconductivity and Antiferromagnetism in Topological Magnet MnBi2Te4 Films

Authors: Wei Yuan, Zi-Jie Yan, Hemian Yi, Zihao Wang, Stephen Paolini, Yi-Fan Zhao, Ling-Jie Zhou, Annie G. Wang, Ke Wang, Thomas Prokscha, Zaher Salman, Andreas Suter, Purnima P. Balakrishnan, Alexander J. Grutter, Laurel E. Winter, John Singleton, Moses H. W. Chan, Cui-Zu Chang

Abstract: The interface of two materials can harbor unexpected emergent phenomena. One example is interface-induced superconductivity. In this work, we employ molecular beam epitaxy to grow a series of heterostructures formed by stacking together two non-superconducting antiferromagnetic materials, an intrinsic antiferromagnetic topological insulator MnBi2Te4 and an antiferromagnetic iron chalcogenide FeTe.… ▽ More The interface of two materials can harbor unexpected emergent phenomena. One example is interface-induced superconductivity. In this work, we employ molecular beam epitaxy to grow a series of heterostructures formed by stacking together two non-superconducting antiferromagnetic materials, an intrinsic antiferromagnetic topological insulator MnBi2Te4 and an antiferromagnetic iron chalcogenide FeTe. Our electrical transport measurements reveal interface-induced superconductivity in these heterostructures. By performing scanning tunneling microscopy and spectroscopy measurements, we observe a proximity-induced superconducting gap on the top surface of the MnBi2Te4 layer, confirming the interaction between superconductivity and antiferromagnetism in the MnBi2Te4 layer. Our findings will advance the fundamental inquiries into the topological superconducting phase in hybrid devices and provide a promising platform for the exploration of chiral Majorana physics in MnBi2Te4-based heterostructures. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 24 pages, 4 figures, comments are welcome

arXiv:2402.08088 [pdf, other]

Out-of-Distribution Detection and Data Drift Monitoring using Statistical Process Control

Authors: Ghada Zamzmi, Kesavan Venkatesh, Brandon Nelson, Smriti Prathapan, Paul H. Yi, Berkman Sahiner, Jana G. Delfino

Abstract: Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety. Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift mon… ▽ More Background: Machine learning (ML) methods often fail with data that deviates from their training distribution. This is a significant concern for ML-enabled devices in clinical settings, where data drift may cause unexpected performance that jeopardizes patient safety. Method: We propose a ML-enabled Statistical Process Control (SPC) framework for out-of-distribution (OOD) detection and drift monitoring. SPC is advantageous as it visually and statistically highlights deviations from the expected distribution. To demonstrate the utility of the proposed framework for monitoring data drift in radiological images, we investigated different design choices, including methods for extracting feature representations, drift quantification, and SPC parameter selection. Results: We demonstrate the effectiveness of our framework for two tasks: 1) differentiating axial vs. non-axial computed tomography (CT) images and 2) separating chest x-ray (CXR) from other modalities. For both tasks, we achieved high accuracy in detecting OOD inputs, with 0.913 in CT and 0.995 in CXR, and sensitivity of 0.980 in CT and 0.984 in CXR. Our framework was also adept at monitoring data streams and identifying the time a drift occurred. In a simulation with 100 daily CXR cases, we detected a drift in OOD input percentage from 0-1% to 3-5% within two days, maintaining a low false-positive rate. Through additional experimental results, we demonstrate the framework's data-agnostic nature and independence from the underlying model's structure. Conclusion: We propose a framework for OOD detection and drift monitoring that is agnostic to data, modality, and model. The framework is customizable and can be adapted for specific applications. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.05713 [pdf, other]

Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations

Authors: Pranav Kulkarni, Andrew Chan, Nithya Navarathna, Skylar Chan, Paul H. Yi, Vishwa S. Parekh

Abstract: The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an u… ▽ More The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce undetectable underdiagnosis bias in DL models. Our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups show that adversarial bias attacks demonstrate high-selectivity for bias in the targeted group by degrading group model performance without impacting overall model performance. Furthermore, our results indicate that adversarial bias attacks result in biased DL models that propagate prediction bias even when evaluated with external datasets. △ Less

Submitted 7 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 29 pages, 4 figures

arXiv:2402.03248 [pdf, ps, other]

On a question of Gary G. Gundersen concerning meromorphic functions sharing three distinct values IM and a fourth value CM

Authors: Xiao-Min Li, Qing-Fei Zhai, Hong-Xun Yi

Abstract: In 1992, Gundersen (Complex Var. Elliptic Equ.20 (1992), no. 1-4, 99-106.) proposed the following famous open question: if two non-constant meromorphic functions share three values IM and share a fourth value CM, then do the functions necessarily share all four values CM? The open question is a long-standing question in the studies of the Nevanlinna$'$s value distribution theory of meromorphic fun… ▽ More In 1992, Gundersen (Complex Var. Elliptic Equ.20 (1992), no. 1-4, 99-106.) proposed the following famous open question: if two non-constant meromorphic functions share three values IM and share a fourth value CM, then do the functions necessarily share all four values CM? The open question is a long-standing question in the studies of the Nevanlinna$'$s value distribution theory of meromorphic functions, and has not been completely resolved by now. In this paper, we prove that if two distinct non-constant meromorphic functions $f$ and $g$ of finite order share $0,$ $1,$ $c$ IM and $\infty$ CM, where $c$ is a finite complex value such that $c\not\in\{0,1\},$ then $f$ and $g$ share $0,$ $1,$ $c,$ $\infty$ CM. Applying the main result obtained in this paper, we completely resolve a question proposed by Gary G. Gundersen on Page 458 of his paper (J. London Math. Soc. 20(1979), no. 2, 457-466.)concerning the nonexistence of two distinct non-constant meromorphic functions sharing three distinct values DM and a fourth value CM. The obtained result also improves the corresponding result on Pages 109-117 in (E. Mues, Bemerkungen zum vier-punkte-satz, Complex Methods on Partial Diferential Equations, 109-117, Math. Res. 53, Akademie-Verlag, Berlin, 1989.) concerning the nonexistence of two distinct non-constant entire functions that share three distinct finite values DM. Examples are provided to show that the main results obtained in this paper, in a sense, are best possible. △ Less

Submitted 10 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: The total pages of the present paper is 39. Neither figures nor conferences are related to this present paper

MSC Class: 30D35; 30D30

arXiv:2401.13982 [pdf]

doi 10.1103/PhysRevMaterials.8.014203

Buffer-layer-controlled Nickeline vs Zinc-Blende/Wurtzite-type MnTe growths on c-plane Al2O3 substrates

Authors: Deepti Jain, Hee Taek Yi, Alessandro R. Mazza, Kim Kisslinger, Myung-Geun Han, Matthew Brahlek, Seongshik Oh

Abstract: In the recent past, MnTe has proven to be a crucial component of the intrinsic magnetic topological insulator (IMTI) family [MnTe]m[Bi2Te3]n, which hosts a wide range of magneto-topological properties depending on the choice of m and n. However, bulk crystal growth allows only a few combinations of m and n for these IMTIs due to the strict limitations of the thermodynamic growth conditions. One wa… ▽ More In the recent past, MnTe has proven to be a crucial component of the intrinsic magnetic topological insulator (IMTI) family [MnTe]m[Bi2Te3]n, which hosts a wide range of magneto-topological properties depending on the choice of m and n. However, bulk crystal growth allows only a few combinations of m and n for these IMTIs due to the strict limitations of the thermodynamic growth conditions. One way to overcome this challenge is to utilize atomic layer-by-layer molecular beam epitaxy (MBE) technique, which allows arbitrary sequences of [MnTe]m and [Bi2Te3]n to be formed beyond the thermodynamic limit. For such MBE growth, finding optimal growth templates and conditions for the parent building block, MnTe, is a key requirement. Here, we report that two different hexagonal phases of MnTe-nickeline (NC) and zinc-blende/wurtzite (ZB-WZ) structures, with distinct in-plane lattice constants of 4.20 +/- 0.04 A and 4.39 +/- 0.04 A, respectively-can be selectively grown on c-plane Al2O3 substrates using different buffer layers and growth temperatures. Moreover, we provide the first comparative studies of different MnTe phases using atomic-resolution scanning transmission electron microscopy and show that ZB and WZ-like stacking sequences can easily alternate between the two. Surprisingly, In2Se3 buffer layer, despite its lattice constant (4.02 A) being closer to that of the NC phase, fosters the ZB-WZ instead, whereas Bi2Te3, sharing the same lattice constant (4.39 A) with the ZB-WZ phase, fosters the NC phase. These discoveries suggest that lattice matching is not always the most critical factor determining the preferred phase during epitaxial growth. Overall, this will deepen our understanding of epitaxial growth modes for chalcogenide materials and accelerate progress toward new IMTI phases as well as other magneto-topological applications. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.12522 [pdf, other]

BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models

Authors: Feng Lin, Hanling Yi, Hongbin Li, Yifan Yang, Xiaotian Yu, Guangming Lu, Rong Xiao

Abstract: Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of pro… ▽ More Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of prompt tuning, we enhance LLMs with a parameter-efficient design called bi-directional tuning for the capability in semi-autoregressive generation. Employing efficient tree-based decoding, the models perform draft candidate generation and verification in parallel, ensuring outputs identical to their autoregressive counterparts under greedy sampling. BiTA serves as a lightweight plug-in module, seamlessly boosting the inference efficiency of existing LLMs without requiring additional assistance models or incurring significant extra memory costs. Applying the proposed BiTA, LLaMA-2-70B-Chat achieves a 2.7$\times$ speedup on the MT-Bench benchmark. Extensive experiments confirm our method surpasses state-of-the-art acceleration techniques. △ Less

Submitted 25 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

Comments: An appendix has been included. Source code at https://github.com/linfeng93/BiTA

arXiv:2401.06985 [pdf, other]

Electrodynamics of the quantum anomalous Hall state in a magnetically doped topological insulator

Authors: Zhenisbek Tagay, Hee Taek Yi, Deepti Jain, Seongshik Oh, N. P. Armitage

Abstract: Magnetically doped topological insulators have been extensively studied over the past decade as a material platform to exhibit quantum anomalous Hall effect. Most material realizations are magnetically doped and despite material advances suffer from large disorder effects. In such systems, it is believed that magnetic disorder leads to a spatially varying Dirac mass gap and chemical potential fluc… ▽ More Magnetically doped topological insulators have been extensively studied over the past decade as a material platform to exhibit quantum anomalous Hall effect. Most material realizations are magnetically doped and despite material advances suffer from large disorder effects. In such systems, it is believed that magnetic disorder leads to a spatially varying Dirac mass gap and chemical potential fluctuations, and hence quantized conductance is only observed at very low temperatures. Here, we use a recently developed high-precision time-domain terahertz (THz) polarimeter to study the low-energy electrodynamic response of Cr-doped (Bi,Sb)$_2$Te$_3$ thin films. These films have been recently shown to exhibit a dc quantized anomalous Hall response up to T = 2 K at zero gate voltage. We show that the real part of the THz range Hall conductance $σ_{xy}(ω)$ is slightly smaller than $e^2/h$ down to T = 2 K with an unconventional decreasing dependence on frequency. The imaginary (dissipative) part of $σ_{xy}(ω)$ is small, but increasing as a function of omega. We connect both aspects of our data to a simple model for effective magnetic gap disorder. Our work highlights the different effect that disorder can have on the dc vs. ac quantum anomalous Hall effect. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: 6 pages, 4 figures

arXiv:2312.17449 [pdf, other]

DB-GPT: Empowering Database Interactions with Private Large Language Models

Authors: Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhi** Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, Faqiang Chen

Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user… ▽ More The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user experience and accessibility. DB-GPT is designed to understand natural language queries, provide context-aware responses, and generate complex SQL queries with high accuracy, making it an indispensable tool for users ranging from novice to expert. The core innovation in DB-GPT lies in its private LLM technology, which is fine-tuned on domain-specific corpora to maintain user privacy and ensure data security while offering the benefits of state-of-the-art LLMs. We detail the architecture of DB-GPT, which includes a novel retrieval augmented generation (RAG) knowledge system, an adaptive learning mechanism to continuously improve performance based on user feedback and a service-oriented multi-model framework (SMMF) with powerful data-driven agents. Our extensive experiments and user studies confirm that DB-GPT represents a paradigm shift in database interactions, offering a more natural, efficient, and secure way to engage with data repositories. The paper concludes with a discussion of the implications of DB-GPT framework on the future of human-database interaction and outlines potential avenues for further enhancements and applications in the field. The project code is available at https://github.com/eosphoros-ai/DB-GPT. Experience DB-GPT for yourself by installing it with the instructions https://github.com/eosphoros-ai/DB-GPT#install and view a concise 10-minute video at https://www.youtube.com/watch?v=KYs4nTDzEhk. △ Less

Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.08687 [pdf, other]

Magneto-optical effects of an artificially-layered ferromagnetic topological insulator

Authors: Xingyue Han, Hee Taek Yi, Seongshik Oh, Liang Wu

Abstract: Magnetic topological insulator is a fertile platform to study the interplay between magnetism and topology. The unique electronic band structure can induce exotic transport and optical properties. However, a comprehensive optical study in both near-infrared frequency and terahertz frequency has been lacking. Here, we report magneto-optical effects from a heterostructure of Cr-incorporated topologi… ▽ More Magnetic topological insulator is a fertile platform to study the interplay between magnetism and topology. The unique electronic band structure can induce exotic transport and optical properties. However, a comprehensive optical study in both near-infrared frequency and terahertz frequency has been lacking. Here, we report magneto-optical effects from a heterostructure of Cr-incorporated topological insulator, CBST. We use 800 nm magneto-optical Kerr effect to reveal a ferromagnetic order in the CBST film with a high transition temperature at 160 K. We also use time-domain terahertz polarimetry to reveal a terahertz Faraday rotation of 1.5 mrad and Kerr rotation of 5.1 mrad at 2 K. The calculated terahertz Hall conductance is 0.42 $e^2/h$. Our work shows the optical responses of an artificially layered magnetic topological insulator, paving the way towards high-temperature quantum anomalous Hall effect via heterostructure engineering. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures

arXiv:2312.07981 [pdf]

doi 10.1016/j.ymssp.2024.111481

Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation

Authors: Haiming Yi, Lei Hou, Yuhong **, Nasser A. Saeed, Ali Kandil, Hao Duan

Abstract: Diffusion models have demonstrated powerful data generation capabilities in various research fields such as image generation. However, in the field of vibration signal generation, the criteria for evaluating the quality of the generated signal are different from that of image generation and there is a fundamental difference between them. At present, there is no research on the ability of diffusion… ▽ More Diffusion models have demonstrated powerful data generation capabilities in various research fields such as image generation. However, in the field of vibration signal generation, the criteria for evaluating the quality of the generated signal are different from that of image generation and there is a fundamental difference between them. At present, there is no research on the ability of diffusion model to generate vibration signal. In this paper, a Time Series Diffusion Method (TSDM) is proposed for vibration signal generation, leveraging the foundational principles of diffusion models. The TSDM uses an improved U-net architecture with attention block, ResBlock and TimeEmbedding to effectively segment and extract features from one-dimensional time series data. It operates based on forward diffusion and reverse denoising processes for time-series generation. Experimental validation is conducted using single-frequency, multi-frequency datasets, and bearing fault datasets. The results show that TSDM can accurately generate the single-frequency and multi-frequency features in the time series and retain the basic frequency features for the diffusion generation results of the bearing fault series. It is also found that the original DDPM could not generate high quality vibration signals, but the improved U-net in TSDM, which applied the combination of attention block and ResBlock, could effectively improve the quality of vibration signal generation. Finally, TSDM is applied to the small sample fault diagnosis of three public bearing fault datasets, and the results show that the accuracy of small sample fault diagnosis of the three datasets is improved by 32.380%, 18.355% and 9.298% at most, respectively. △ Less

Submitted 30 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Journal ref: Mechanical Systems and Signal Processing, 2024, 216: 111481

arXiv:2312.04353 [pdf]

Interface-Induced Superconductivity in Magnetic Topological Insulator-Iron Chalcogenide Heterostructures

Authors: Hemian Yi, Yi-Fan Zhao, Ying-Ting Chan, Jiaqi Cai, Ruobing Mei, Xianxin Wu, Zi-Jie Yan, Ling-Jie Zhou, Ruoxi Zhang, Zihao Wang, Stephen Paolini, Run Xiao, Ke Wang, Anthony R. Richardella, John Singleton, Laurel E. Winter, Thomas Prokscha, Zaher Salman, Andreas Suter, Purnima P. Balakrishnan, Alexander J. Grutter, Moses H. W. Chan, Nitin Samarth, Xiaodong Xu, Weida Wu , et al. (2 additional authors not shown)

Abstract: When two different electronic materials are brought together, the resultant interface often shows unexpected quantum phenomena, including interfacial superconductivity and Fu-Kane topological superconductivity (TSC). Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures formed by stacking together two magnetic materials, a ferromagnetic topological insulator (TI) and an antiferr… ▽ More When two different electronic materials are brought together, the resultant interface often shows unexpected quantum phenomena, including interfacial superconductivity and Fu-Kane topological superconductivity (TSC). Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures formed by stacking together two magnetic materials, a ferromagnetic topological insulator (TI) and an antiferromagnetic iron chalcogenide (FeTe). We discover emergent interface-induced superconductivity in these heterostructures and demonstrate the trifecta occurrence of superconductivity, ferromagnetism, and topological band structure in the magnetic TI layer, the three essential ingredients of chiral TSC. The unusual coexistence of ferromagnetism and superconductivity can be attributed to the high upper critical magnetic field that exceeds the Pauli paramagnetic limit for conventional superconductors at low temperatures. The magnetic TI/FeTe heterostructures with robust superconductivity and atomically sharp interfaces provide an ideal wafer-scale platform for the exploration of chiral TSC and Majorana physics, constituting an important step toward scalable topological quantum computation. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: 14 pages, 4 figures. Accepted by Science. Comments are welcome

arXiv:2312.01614 [pdf]

Three-Dimensional Quantum Anomalous Hall Effect in Magnetic Topological Insulator Trilayers of Hundred-Nanometer Thickness

Authors: Yi-Fan Zhao, Ruoxi Zhang, Zi-Ting Sun, Ling-Jie Zhou, Deyi Zhuo, Zi-Jie Yan, Hemian Yi, Ke Wang, Moses H. W. Chan, Chao-Xing Liu, K. T. Law, Cui-Zu Chang

Abstract: Magnetic topological states refer to a class of exotic phases in magnetic materials with their non-trivial topological property determined by magnetic spin configurations. An example of such states is the quantum anomalous Hall (QAH) state, which is a zero magnetic field manifestation of the quantum Hall effect. Current research in this direction focuses on QAH insulators with a thickness of less… ▽ More Magnetic topological states refer to a class of exotic phases in magnetic materials with their non-trivial topological property determined by magnetic spin configurations. An example of such states is the quantum anomalous Hall (QAH) state, which is a zero magnetic field manifestation of the quantum Hall effect. Current research in this direction focuses on QAH insulators with a thickness of less than 10nm. The thick QAH insulators in the three-dimensional(3D) regime are limited, largely due to inevitable bulk carriers being introduced in thick magnetic TI samples. Here, we employ molecular beam epitaxy (MBE) to synthesize magnetic TI trilayers with a thickness of up to ~106 nm. We find these samples exhibit well-quantized Hall resistance and vanishing longitudinal resistance at zero magnetic field. By varying magnetic dopants, gate voltages, temperature, and external magnetic fields, we examine the properties of these thick QAH insulators and demonstrate the robustness of the 3D QAH effect. The realization of the well-quantized 3D QAH effect indicates that the nonchiral side surface states of our thick magnetic TI trilayers are gapped and thus do not affect the QAH quantization. The 3D QAH insulators of hundred-nanometer thickness provide a promising platform for the exploration of fundamental physics, including axion physics and image magnetic monopole, and the advancement of electronic and spintronic devices to circumvent Moore's law. △ Less

Submitted 7 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: 24 pages, 5 figures. Comments are welcome

arXiv:2311.17074 [pdf, other]

Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification

Authors: Siyuan Huang, Yifan Zhou, Ram Prabhakar, Xijun Liu, Yuxiang Guo, Hongrui Yi, Cheng Peng, Rama Chellappa, Chun Pong Lau

Abstract: Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-tr… ▽ More Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-training to enhance ReID performance but are hindered by coarse granularity, on-clothes focus and pre-defined areas. To address these limitations, we propose a Local Semantic Extraction (LSE) module inspired by Interactive Segmentation Models. The LSE module captures fine-grained, biometric, and flexible local semantics, enhancing ReID accuracy. Additionally, we introduce Semantic ReID (SemReID), a pre-training method that leverages LSE to learn effective semantics for seamless transfer across various ReID domains and modalities. Extensive evaluations across nine ReID datasets demonstrates SemReID's robust performance across multiple domains, including clothes-changing ReID, video ReID, unconstrained ReID, and short-term ReID. Our findings highlight the importance of effective semantics in ReID, as SemReID can achieve great performances without domain-specific designs. △ Less

Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.09190 [pdf]

doi 10.1038/s41467-023-42902-2

Dirac-Fermion-Assisted Interfacial Superconductivity in Epitaxial Topological Insulator/Iron Chalcogenide Heterostructures

Authors: Hemian Yi, Lun-Hui Hu, Yi-Fan Zhao, Ling-Jie Zhou, Zi-Jie Yan, Ruoxi Zhang, Wei Yuan, Zihao Wang, Ke Wang, Danielle Reifsnyder Hickey, Anthony R. Richardella, John Singleton, Laurel E. Winter, Xianxin Wu, Moses H. W. Chan, Nitin Samarth, Chao-Xing Liu, Cui-Zu Chang

Abstract: Over the last decade, the possibility of realizing topological superconductivity (TSC) has generated much excitement, mainly due to the potential use of its excitations (Majorana zero modes) in a fault-tolerant topological quantum computer 1,2. TSC can be created in electronic systems where the topological and superconducting orders coexist3, motivating the continued exploration of candidate mater… ▽ More Over the last decade, the possibility of realizing topological superconductivity (TSC) has generated much excitement, mainly due to the potential use of its excitations (Majorana zero modes) in a fault-tolerant topological quantum computer 1,2. TSC can be created in electronic systems where the topological and superconducting orders coexist3, motivating the continued exploration of candidate material platforms to this end. Here, we use molecular beam epitaxy (MBE) to synthesize heterostructures that host emergent interfacial superconductivity when a non-superconducting antiferromagnet (FeTe) is interfaced with a topological insulator (TI) (Bi, Sb)2Te3 wherein the chemical potential can be tuned through varying the Bi/Sb ratio. By performing in-vacuo angle-resolved photoemission spectroscopy (ARPES) and ex-situ electrical transport measurements, we find that the superconducting transition temperature and the upper critical magnetic field are suppressed when the chemical potential approaches the Dirac point. This observation implies a direct correlation between the interfacial superconductivity and Dirac electrons of the TI layer. We provide evidence to show that the observed interfacial superconductivity and its chemical potential dependence is the result of the competition between the Ruderman-Kittel-Kasuya-Yosida-type ferromagnetic coupling mediated by Dirac surface states and antiferromagnetic exchange couplings that generate the bicollinear antiferromagnetic order in the FeTe layer. The Dirac-fermion-assisted interfacial superconductivity in (Bi,Sb)2Te3/FeTe heterostructures provides a new approach to probe TSC and Majorana physics in hybrid devices and potentially constitutes an alternative platform for topological quantum computation. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 32 pages and 4 figures. Accepted by Nature Communications

Journal ref: Nature Commun. 14,7119 (2023)

arXiv:2309.15273 [pdf, other]

DECO: Dense Estimation of 3D Human-Scene Contact In The Wild

Authors: Shashank Tripathi, Agniv Chatterjee, Jean-Claude Passy, Hongwei Yi, Dimitrios Tzionas, Michael J. Black

Abstract: Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. I… ▽ More Understanding how humans use physical contact to interact with the world is key to enabling human-centric artificial intelligence. While inferring 3D contact is crucial for modeling realistic and physically-plausible human-object interactions, existing methods either focus on 2D, consider body joints rather than the surface, use coarse 3D body regions, or do not generalize to in-the-wild images. In contrast, we focus on inferring dense, 3D contact between the full body surface and objects in arbitrary images. To achieve this, we first collect DAMON, a new dataset containing dense vertex-level contact annotations paired with RGB images containing complex human-object and human-scene contact. Second, we train DECO, a novel 3D contact detector that uses both body-part-driven and scene-context-driven attention to estimate vertex-level contact on the SMPL body. DECO builds on the insight that human observers recognize contact by reasoning about the contacting body parts, their proximity to scene objects, and the surrounding scene context. We perform extensive evaluations of our detector on DAMON as well as on the RICH and BEHAVE datasets. We significantly outperform existing SOTA methods across all benchmarks. We also show qualitatively that DECO generalizes well to diverse and challenging real-world human interactions in natural images. The code, data, and models are available at https://deco.is.tue.mpg.de. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted as Oral in ICCV'23. Project page: https://deco.is.tue.mpg.de

arXiv:2309.14600 [pdf, other]

Progressive Text-to-3D Generation for Automatic 3D Prototy**

Authors: Han Yi, Zhedong Zheng, Xiangyu Xu, Tat-seng Chua

Abstract: Text-to-3D generation is to craft a 3D object according to a natural language description. This can significantly reduce the workload for manually designing 3D models and provide a more natural way of interaction for users. However, this problem remains challenging in recovering the fine-grained details effectively and optimizing a large-size 3D output efficiently. Inspired by the success of progr… ▽ More Text-to-3D generation is to craft a 3D object according to a natural language description. This can significantly reduce the workload for manually designing 3D models and provide a more natural way of interaction for users. However, this problem remains challenging in recovering the fine-grained details effectively and optimizing a large-size 3D output efficiently. Inspired by the success of progressive learning, we propose a Multi-Scale Triplane Network (MTN) and a new progressive learning strategy. As the name implies, the Multi-Scale Triplane Network consists of four triplanes transitioning from low to high resolution. The low-resolution triplane could serve as an initial shape for the high-resolution ones, easing the optimization difficulty. To further enable the fine-grained details, we also introduce the progressive learning strategy, which explicitly demands the network to shift its focus of attention from simple coarse-grained patterns to difficult fine-grained patterns. Our experiment verifies that the proposed method performs favorably against existing methods. For even the most challenging descriptions, where most existing methods struggle to produce a viable shape, our proposed method consistently delivers. We aspire for our work to pave the way for automatic 3D prototy** via natural language descriptions. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.06632 [pdf, other]

Hidden non-collinear spin-order induced topological surface states

Authors: Zengle Huang, Hemian Yi, Daniel Kaplan, Lu** Min, Hengxin Tan, Ying-ting Chan, Zhiqiang Mao, Binghai Yan, Cui-Zu Chang, Weida Wu

Abstract: Rare-earth monopnictides are a family of materials simultaneously displaying complex magnetism, strong electronic correlation, and topological band structure. The recently discovered emergent arc-like surface states in these materials have been attributed to the multi-wave-vector antiferromagnetic order, yet the direct experimental evidence has been elusive. Here we report the observation of non-c… ▽ More Rare-earth monopnictides are a family of materials simultaneously displaying complex magnetism, strong electronic correlation, and topological band structure. The recently discovered emergent arc-like surface states in these materials have been attributed to the multi-wave-vector antiferromagnetic order, yet the direct experimental evidence has been elusive. Here we report the observation of non-collinear antiferromagnetic order with multiple modulations using spin-polarized scanning tunneling microscopy. Moreover, we discover a hidden spin-rotation transition of single-to-multiple modulations 2 K below the Neel temperature. The hidden transition coincides with the onset of the surface states splitting observed by our angle-resolved photoemission spectroscopy measurements. Single modulation gives rise to a band inversion with induced topological surface states in a local momentum region while the full Brillouin zone carries trivial topological indices, and multiple modulation further splits the surface bands via non-collinear spin tilting, as revealed by our calculations. The direct evidence of the non-collinear spin order in NdSb not only clarifies the mechanism of the emergent topological surface states, but also opens up a new paradigm of control and manipulation of band topology with magnetism. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 32 pages, 4 figures, 10 extended figures

arXiv:2308.12965 [pdf, other]

POCO: 3D Pose and Shape Estimation with Confidence

Authors: Sai Kumar Dwivedi, Cordelia Schmid, Hongwei Yi, Michael J. Black, Dimitrios Tzionas

Abstract: The regression of 3D Human Pose and Shape (HPS) from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the con… ▽ More The regression of 3D Human Pose and Shape (HPS) from an image is becoming increasingly accurate. This makes the results useful for downstream tasks like human action recognition or 3D graphics. Yet, no regressor is perfect, and accuracy can be affected by ambiguous image evidence or by poses and appearance that are unseen during training. Most current HPS regressors, however, do not report the confidence of their outputs, meaning that downstream tasks cannot differentiate accurate estimates from inaccurate ones. To address this, we develop POCO, a novel framework for training HPS regressors to estimate not only a 3D human body, but also their confidence, in a single feed-forward pass. Specifically, POCO estimates both the 3D body pose and a per-sample variance. The key idea is to introduce a Dual Conditioning Strategy (DCS) for regressing uncertainty that is highly correlated to pose reconstruction quality. The POCO framework can be applied to any HPS regressor and here we evaluate it by modifying HMR, PARE, and CLIFF. In all cases, training the network to reason about uncertainty helps it learn to more accurately estimate 3D pose. While this was not our goal, the improvement is modest but consistent. Our main motivation is to provide uncertainty estimates for downstream tasks; we demonstrate this in two ways: (1) We use the confidence estimates to bootstrap HPS training. Given unlabelled image data, we take the confident estimates of a POCO-trained regressor as pseudo ground truth. Retraining with this automatically-curated data improves accuracy. (2) We exploit uncertainty in video pose estimation by automatically identifying uncertain frames (e.g. due to occlusion) and inpainting these from confident frames. Code and models will be available for research at https://poco.is.tue.mpg.de. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.10899 [pdf, other]

TADA! Text to Animatable Digital Avatars

Authors: Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. Black

Abstract: We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent a… ▽ More We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent alignment between the geometry and the texture, particularly in the face region. To overcome these limitations, TADA leverages the synergy of a 2D diffusion model and an animatable parametric body model. Specifically, we derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map, and use hierarchical rendering with score distillation sampling (SDS) to create high-quality, detailed, holistic 3D avatars from text. To ensure alignment between the geometry and texture, we render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process. We further introduce various expression parameters to deform the generated character during training, ensuring that the semantics of our generated character remain consistent with the original SMPL-X model, resulting in an animatable character. Comprehensive evaluations demonstrate that TADA significantly surpasses existing approaches on both qualitative and quantitative measures. TADA enables creation of large-scale digital character assets that are ready for animation and rendering, while also being easily editable through natural language. The code will be public for research purposes. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.08545 [pdf, other]

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Authors: Yangyi Huang, Hongwei Yi, Yuliang Xiu, Tingting Liao, Jiaxiang Tang, Deng Cai, Justus Thies

Abstract: Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are su… ▽ More Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/TeCH △ Less

Submitted 19 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: Project: https://huangyangyi.github.io/TeCH, Code: https://github.com/huangyangyi/TeCH

arXiv:2307.01200 [pdf, other]

ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning

Authors: Yuxiang Zhang, Hongwen Zhang, Liangxiao Hu, Jiajun Zhang, Hongwei Yi, Sheng** Zhang, Yebin Liu

Abstract: Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motio… ▽ More Learning-based approaches to monocular motion capture have recently shown promising results by learning to regress in a data-driven manner. However, due to the challenges in data collection and network designs, it remains challenging for existing solutions to achieve real-time full-body capture while being accurate in world space. In this work, we introduce ProxyCap, a human-centric proxy-to-motion learning scheme to learn world-space motions from a proxy dataset of 2D skeleton sequences and 3D rotational motions. Such proxy data enables us to build a learning-based network with accurate world-space supervision while also mitigating the generalization issues. For more accurate and physically plausible predictions in world space, our network is designed to learn human motions from a human-centric perspective, which enables the understanding of the same motion captured with different camera trajectories. Moreover, a contact-aware neural motion descent module is proposed in our network so that it can be aware of foot-ground contact and motion misalignment with the proxy observations. With the proposed learning-based solution, we demonstrate the first real-time monocular full-body capture system with plausible foot-ground contact in world space even using hand-held moving cameras. Our project page is https://zhangyux15.github.io/ProxyCapV2. △ Less

Submitted 25 December, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Our project page is https://zhangyux15.github.io/ProxyCapV2

arXiv:2307.00438 [pdf, other]

One Copy Is All You Need: Resource-Efficient Streaming of Medical Imaging Data at Scale

Authors: Pranav Kulkarni, Adway Kanhere, Eliot Siegel, Paul H. Yi, Vishwa S. Parekh

Abstract: Large-scale medical imaging datasets have accelerated development of artificial intelligence tools for clinical decision support. However, the large size of these datasets is a bottleneck for users with limited storage and bandwidth. Many users may not even require such large datasets as AI models are often trained on lower resolution images. If users could directly download at their desired resol… ▽ More Large-scale medical imaging datasets have accelerated development of artificial intelligence tools for clinical decision support. However, the large size of these datasets is a bottleneck for users with limited storage and bandwidth. Many users may not even require such large datasets as AI models are often trained on lower resolution images. If users could directly download at their desired resolution, storage and bandwidth requirements would significantly decrease. However, it is impossible to anticipate every users' requirements and impractical to store the data at multiple resolutions. What if we could store images at a single resolution but send them at different ones? We propose MIST, an open-source framework to operationalize progressive resolution for streaming medical images at multiple resolutions from a single high-resolution copy. We demonstrate that MIST can dramatically reduce imaging infrastructure inefficiencies for hosting and streaming medical images by >90%, while maintaining diagnostic quality for deep learning applications. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 13 pages, 4 figures, 2 tables

arXiv:2306.16736 [pdf, other]

GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction

Authors: Sihan Ma, Qiong Cao, Hongwei Yi, **g Zhang, Dacheng Tao

Abstract: Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast,… ▽ More Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane. Prior methods have modeled human-ground interactions either implicitly or in a sparse manner, often resulting in unrealistic and incorrect motions when faced with noise and uncertainty. In contrast, our approach explicitly represents these interactions in a dense and continuous manner. To this end, we propose a novel Ground-aware Motion Model for 3D Human Motion Reconstruction, named GraMMaR, which jointly learns the distribution of transitions in both pose and interaction between every joint and ground plane at each time step of a motion sequence. It is trained to explicitly promote consistency between the motion and distance change towards the ground. After training, we establish a joint optimization strategy that utilizes GraMMaR as a dual-prior, regularizing the optimization towards the space of plausible ground-aware motions. This leads to realistic and coherent motion reconstruction, irrespective of the assumed or learned ground plane. Through extensive evaluation on the AMASS and AIST++ datasets, our model demonstrates good generalization and discriminating abilities in challenging cases including complex and ambiguous human-ground interactions. The code will be available at https://github.com/xymsh/GraMMaR. △ Less

Submitted 16 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Accepted to ACM Multimedia 2023. The code will be available at https://github.com/xymsh/GraMMaR

arXiv:2306.15346 [pdf, other]

Episodic Accretion in Protostars -- An ALMA Survey of Molecular Jets in the Orion Molecular Cloud

Authors: Somnath Dutta, Chin-Fei Lee, Doug Johnstone, Jeong-Eun Lee, Naomi Hirano, James Di Francesco, Anthony Moraghan, Tie Liu, Dipen Sahu, Sheng-Yuan Liu, Kenichi Tatematsu, Chang Won Lee, Shanghuo Li, David Eden, Mika Juvela, Leonardo Bronfman, Shih-Ying Hsu, Kee-Tae Kim, Woo** Kwon, Patricio Sanhueza, Jesus Alejandro Lopez-Vazquez, Qiuyi Luo, Hee-Weon Yi

Abstract: Protostellar outflows and jets are almost ubiquitous characteristics during the mass accretion phase, and encode the history of stellar accretion, complex-organic molecule (COM) formation, and planet formation. Episodic jets are likely connected to episodic accretion through the disk. Despite the importance, there is a lack of studies of a statistically significant sample of protostars via high-se… ▽ More Protostellar outflows and jets are almost ubiquitous characteristics during the mass accretion phase, and encode the history of stellar accretion, complex-organic molecule (COM) formation, and planet formation. Episodic jets are likely connected to episodic accretion through the disk. Despite the importance, there is a lack of studies of a statistically significant sample of protostars via high-sensitivity and high-resolution observations. To explore episodic accretion mechanisms and the chronologies of episodic events, we investigated 42 fields containing protostars with ALMA observations of CO, SiO, and 1.3\,mm continuum emission. We detected SiO emission in 21 fields, where 19 sources are driving confirmed molecular jets with high abundances of SiO. Jet velocities, mass-loss rates, mass-accretion rates, and periods of accretion events are found to be dependent on the driving forces of the jet (e.g., bolometric luminosity, envelope mass). Next, velocities and mass-loss rates are positively correlated with the surrounding envelope mass, suggesting that the presence of high mass around protostars increases the ejection-accretion activity. We determine mean periods of ejection events of 20$-$175 years for our sample, which could be associated with perturbation zones of $\sim$ 2$-$25\,au extent around the protostars. Also, mean ejection periods are anti-correlated with the envelope mass, where high-accretion rates may trigger more frequent ejection events. The observed periods of outburst/ejection are much shorter than the freeze-out time scale of the simplest COMs like CH$_3$OH, suggesting that episodic events largely maintain the ice-gas balance inside and around the snowline. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Submitted to Journal; 27 pages, 15 Figures and additional Appendix material

arXiv:2306.13016 [pdf]

Axion Insulator State in Hundred-Nanometer-Thick Magnetic Topological Insulator Sandwich Heterostructures

Authors: Deyi Zhuo, Zi-Jie Yan, Zi-Ting Sun, Ling-Jie Zhou, Yi-Fan Zhao, Ruoxi Zhang, Ruobing Mei, Hemian Yi, Ke Wang, Moses H. W. Chan, Chao-Xing Liu, K. T. Law, Cui-Zu Chang

Abstract: An axion insulator is a three-dimensional (3D) topological insulator (TI), in which the bulk maintains the time-reversal symmetry or inversion symmetry but the surface states are gapped by surface magnetization. The axion insulator state has been observed in molecular beam epitaxy (MBE)-grown magnetically doped TI sandwiches and exfoliated intrinsic magnetic TI MnBi2Te4 flakes with an even number… ▽ More An axion insulator is a three-dimensional (3D) topological insulator (TI), in which the bulk maintains the time-reversal symmetry or inversion symmetry but the surface states are gapped by surface magnetization. The axion insulator state has been observed in molecular beam epitaxy (MBE)-grown magnetically doped TI sandwiches and exfoliated intrinsic magnetic TI MnBi2Te4 flakes with an even number layer. All these samples have a thickness of ~10 nm, near the 2D-to-3D boundary. The coupling between the top and bottom surface states in thin samples may hinder the observation of quantized topological magnetoelectric response. Here, we employ MBE to synthesize magnetic TI sandwich heterostructures and find that the axion insulator state persists in a 3D sample with a thickness of ~106 nm. Our transport results show that the axion insulator state starts to emerge when the thickness of the middle undoped TI layer is greater than ~3 nm. The 3D hundred-nanometer-thick axion insulator provides a promising platform for the exploration of the topological magnetoelectric effect and other emergent magnetic topological states, such as the high-order TI phase. △ Less

Submitted 3 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: 25 pages, 5 figures. Comments are very much welcome

arXiv:2306.04552 [pdf]

doi 10.1021/acs.nanolett.3c01313

High temperature, gate-free quantum anomalous Hall effect with an active cap** layer

Authors: Hee Taek Yi, Deepti Jain, Xiong Yao, Seongshik Oh

Abstract: Quantum anomalous Hall effect (QAHE) was discovered a decade ago, but is still not utilized beyond a handful of research groups, due to numerous limitations such as extremely low temperature, electric field-effect gating requirement, small sample sizes and environmental aging effect. Here, we present a robust platform that provides effective solutions to these problems. Specifically, on this platf… ▽ More Quantum anomalous Hall effect (QAHE) was discovered a decade ago, but is still not utilized beyond a handful of research groups, due to numerous limitations such as extremely low temperature, electric field-effect gating requirement, small sample sizes and environmental aging effect. Here, we present a robust platform that provides effective solutions to these problems. Specifically, on this platform, we observe QAH signatures at record high temperatures, with the Hall conductance of 1.00 e2/h at 2.0 K, 0.98 e2/h at 4.2 K, and 0.92 e2/h at 10 K, on centimeter-scale substrates, without electric-field-effect gating. The key ingredient is an active CrOx cap** layer, which substantially boosts the ferromagnetism while suppressing environmental degradation. With this development, QAHE will now be accessible to much broader applications than before. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 20 pages, 8 figures, Accepted for publication in Nano Letters, https://pubs.acs.org/doi/full/10.1021/acs.nanolett.3c01313

arXiv:2305.18993 [pdf, other]

ConES: Concept Embedding Search for Parameter Efficient Tuning Large Vision Language Models

Authors: Huahui Yi, Ziyuan Qin, Wei Xu, Miaotian Guo, Kun Wang, Shaoting Zhang, Kang Li, Qicheng Lao

Abstract: Large pre-trained vision-language models have shown great prominence in transferring pre-acquired knowledge to various domains and downstream tasks with appropriate prompting or tuning. Existing prevalent tuning methods can be generally categorized into three genres: 1) prompt engineering by creating suitable prompt texts, which is time-consuming and requires domain expertise; 2) or simply fine-tu… ▽ More Large pre-trained vision-language models have shown great prominence in transferring pre-acquired knowledge to various domains and downstream tasks with appropriate prompting or tuning. Existing prevalent tuning methods can be generally categorized into three genres: 1) prompt engineering by creating suitable prompt texts, which is time-consuming and requires domain expertise; 2) or simply fine-tuning the whole model, which is extremely inefficient; 3) prompt tuning through parameterized prompt embeddings with the text encoder. Nevertheless, all methods rely on the text encoder for bridging the modality gap between vision and language. In this work, we question the necessity of the cumbersome text encoder for a more lightweight and efficient tuning paradigm as well as more representative prompt embeddings closer to the image representations. To achieve this, we propose a Concept Embedding Search (ConES) approach by optimizing prompt embeddings -- without the need of the text encoder -- to capture the 'concept' of the image modality through a variety of task objectives. By drop** the text encoder, we are able to significantly speed up the learning process, \eg, from about an hour to just ten minutes in our experiments for personalized text-to-image generation without impairing the generation quality. Moreover, our proposed approach is orthogonal to current existing tuning methods since the searched concept embeddings can be further utilized in the next stage of fine-tuning the pre-trained large models for boosting performance. Extensive experiments show that our approach can beat the prompt tuning and textual inversion methods in a variety of downstream tasks including objection detection, instance segmentation, and image generation. Our approach also shows better generalization capability for unseen concepts in specialized domains, such as the medical domain. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.15617 [pdf, other]

ISLE: An Intelligent Streaming Framework for High-Throughput AI Inference in Medical Imaging

Authors: Pranav Kulkarni, Sean Garin, Adway Kanhere, Eliot Siegel, Paul H. Yi, Vishwa S. Parekh

Abstract: As the adoption of Artificial Intelligence (AI) systems within the clinical environment grows, limitations in bandwidth and compute can create communication bottlenecks when streaming imaging data, leading to delays in patient care and increased cost. As such, healthcare providers and AI vendors will require greater computational infrastructure, therefore dramatically increasing costs. To that end… ▽ More As the adoption of Artificial Intelligence (AI) systems within the clinical environment grows, limitations in bandwidth and compute can create communication bottlenecks when streaming imaging data, leading to delays in patient care and increased cost. As such, healthcare providers and AI vendors will require greater computational infrastructure, therefore dramatically increasing costs. To that end, we developed ISLE, an intelligent streaming framework for high-throughput, compute- and bandwidth- optimized, and cost effective AI inference for clinical decision making at scale. In our experiments, ISLE on average reduced data transmission by 98.02% and decoding time by 98.09%, while increasing throughput by 2,730%. We show that ISLE results in faster turnaround times, and reduced overall cost of data, transmission, and compute, without negatively impacting clinical decision making using AI systems. △ Less

Submitted 25 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 5 pages, 3 figures, 3 tables

arXiv:2305.07637 [pdf, other]

Text2Cohort: Facilitating Intuitive Access to Biomedical Data with Natural Language Cohort Discovery

Authors: Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

Abstract: The Imaging Data Commons (IDC) is a cloud-based database that provides researchers with open access to cancer imaging data, with the goal of facilitating collaboration. However, cohort discovery within the IDC database has a significant technical learning curve. Recently, large language models (LLM) have demonstrated exceptional utility for natural language processing tasks. We developed Text2Coho… ▽ More The Imaging Data Commons (IDC) is a cloud-based database that provides researchers with open access to cancer imaging data, with the goal of facilitating collaboration. However, cohort discovery within the IDC database has a significant technical learning curve. Recently, large language models (LLM) have demonstrated exceptional utility for natural language processing tasks. We developed Text2Cohort, a LLM-powered toolkit to facilitate user-friendly natural language cohort discovery in the IDC. Our method translates user input into IDC queries using grounding techniques and returns the query's response. We evaluate Text2Cohort on 50 natural language inputs, from information extraction to cohort discovery. Our toolkit successfully generated responses with an 88% accuracy and 0.94 F1 score. We demonstrate that Text2Cohort can enable researchers to discover and curate cohorts on IDC with high levels of accuracy using natural language in a more intuitive and user-friendly way. △ Less

Submitted 25 November, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2304.03903 [pdf, other]

High-Fidelity Clothed Avatar Reconstruction from a Single Image

Authors: Tingting Liao, Xiaomei Zhang, Yuliang Xiu, Hongwei Yi, Xudong Liu, Guo-Jun Qi, Yong Zhang, Xuan Wang, Xiangyu Zhu, Zhen Lei

Abstract: This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the… ▽ More This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the canonical space of a person in a learning-based way, and at the second stage, we refine the surface detail by estimating the non-rigid deformation in the posed space in an optimization way. A hyper-network is utilized to generate a good initialization so that the convergence o f the optimization process is greatly accelerated. Extensive experiments on various datasets show that the proposed CAR successfully produces high-fidelity avatars for arbitrarily clothed humans in real scenes. △ Less

Submitted 8 April, 2023; originally announced April 2023.

arXiv:2303.09095 [pdf, other]

SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

Authors: Yudi Dai, Yitai Lin, Xi** Lin, Chenglu Wen, Lan Xu, Hongwei Yi, Siqi Shen, Yuexin Ma, Cheng Wang

Abstract: We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3… ▽ More We present SLOPER4D, a novel scene-aware dataset collected in large urban environments to facilitate the research of global human pose estimation (GHPE) with human-scene interaction in the wild. Employing a head-mounted device integrated with a LiDAR and camera, we record 12 human subjects' activities over 10 diverse urban scenes from an egocentric view. Frame-wise annotations for 2D key points, 3D pose parameters, and global translations are provided, together with reconstructed scene point clouds. To obtain accurate 3D ground truth in such large dynamic scenes, we propose a joint optimization method to fit local SMPL meshes to the scene and fine-tune the camera calibration during dynamic motions frame by frame, resulting in plausible and scene-natural 3D human poses. Eventually, SLOPER4D consists of 15 sequences of human motions, each of which has a trajectory length of more than 200 meters (up to 1,300 meters) and covers an area of more than 2,000 $m^2$ (up to 13,000 $m^2$), including more than 100K LiDAR frames, 300k video frames, and 500K IMU-based motion frames. With SLOPER4D, we provide a detailed and thorough analysis of two critical tasks, including camera-based 3D HPE and LiDAR-based 3D HPE in urban environments, and benchmark a new task, GHPE. The in-depth analysis demonstrates SLOPER4D poses significant challenges to existing methods and produces great research opportunities. The dataset and code are released at \url{http://www.lidarhumanmotion.net/sloper4d/} △ Less

Submitted 18 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: 11 pages,7 figures, CVPR2023

arXiv:2303.06580 [pdf, other]

Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

Authors: Huahui Yi, Ziyuan Qin, Qicheng Lao, Wei Xu, Zekun Jiang, Dequan Wang, Shaoting Zhang, Kang Li

Abstract: Inevitable domain and task discrepancies in real-world scenarios can impair the generalization performance of the pre-trained deep models for medical data. Therefore, we audaciously propose that we should build a general-purpose medical AI system that can be seamlessly adapted to downstream domains/tasks. Since the domain/task adaption procedures usually involve additional labeling work for the ta… ▽ More Inevitable domain and task discrepancies in real-world scenarios can impair the generalization performance of the pre-trained deep models for medical data. Therefore, we audaciously propose that we should build a general-purpose medical AI system that can be seamlessly adapted to downstream domains/tasks. Since the domain/task adaption procedures usually involve additional labeling work for the target data, designing a data-efficient adaption algorithm is desired to save the cost of transferring the learned knowledge. Our recent work found that vision-language models (VLMs) are efficient learners with extraordinary cross-domain ability. Therefore, in this work, we further explore the possibility of leveraging pre-trained VLMs as medical foundation models for building general-purpose medical AI, where we thoroughly investigate three machine-learning paradigms, i.e., domain/task-specialized learning, joint learning, and continual learning, for training the VLMs and evaluate their generalization performance on cross-domain and cross-task test sets. To alleviate the catastrophic forgetting during sequential training, we employ rehearsal learning and receive a sharp boost in terms of generalization capability. In a nutshell, our empirical evidence suggests that continual learning may be a practical and efficient learning paradigm for the medical foundation model. And we hope researchers can use our empirical evidence as basement to further explore the path toward medical foundation model. △ Less

Submitted 12 March, 2023; originally announced March 2023.

arXiv:2303.06180 [pdf, other]

Optimizing Federated Learning for Medical Image Classification on Distributed Non-iid Datasets with Partial Labels

Authors: Pranav Kulkarni, Adway Kanhere, Paul H. Yi, Vishwa S. Parekh

Abstract: Numerous large-scale chest x-ray datasets have spearheaded expert-level detection of abnormalities using deep learning. However, these datasets focus on detecting a subset of disease labels that could be present, thus making them distributed and non-iid with partial labels. Recent literature has indicated the impact of batch normalization layers on the convergence of federated learning due to doma… ▽ More Numerous large-scale chest x-ray datasets have spearheaded expert-level detection of abnormalities using deep learning. However, these datasets focus on detecting a subset of disease labels that could be present, thus making them distributed and non-iid with partial labels. Recent literature has indicated the impact of batch normalization layers on the convergence of federated learning due to domain shift associated with non-iid data with partial labels. To that end, we propose FedFBN, a federated learning framework that draws inspiration from transfer learning by using pretrained networks as the model backend and freezing the batch normalization layers throughout the training process. We evaluate FedFBN with current FL strategies using synthetic iid toy datasets and large-scale non-iid datasets across scenarios with partial and complete labels. Our results demonstrate that FedFBN outperforms current aggregation strategies for training global models using distributed and non-iid data with partial labels. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 10 pages, 1 algorithm, 4 tables

arXiv:2302.14817 [pdf, ps, other]

A Cooperative Content Dissemination Framework for Fog-Based Internet of Vehicles

Authors: Weihua Wu, Peng Wang, Yuan Zhang, Weijia Han, He Yi, Tony Q. S. Quek

Abstract: As the fog-based internet of vehicles (IoV) is equipped with rich perception, computing, communication and storage resources, it provides a new solution for the bulk data processing. However, the impact caused by the mobility of vehicles brings a challenge to the content scheduling and resource allocation of content dissemination service. In this paper, we propose a time-varying resource relations… ▽ More As the fog-based internet of vehicles (IoV) is equipped with rich perception, computing, communication and storage resources, it provides a new solution for the bulk data processing. However, the impact caused by the mobility of vehicles brings a challenge to the content scheduling and resource allocation of content dissemination service. In this paper, we propose a time-varying resource relationship graph to model the intertwined impact of the perception, computation, communication and storage resources across multiple snapshots on the content dissemination process of IoV. Based on this graph model, the content dissemination process is modeled as a mathematical optimization problem, where the quality of service of both delay tolerant and delay sensitive services are considered. Owing to its NP-completeness, the optimization problem is decomposed into a joint link and subchannel scheduling subproblem and as well a joint power and flow control subproblem. Then, a cascaded low complexity scheduling algorithm is proposed for the joint link and subchannel scheduling subproblem. Moreover, a robust resource management algorithm is developed for the power and flow control subproblem, where the channel uncertainties in future snapshots are fully considered in the algorithm. Finally, we conduct simulations to show that the effectiveness of the proposed approaches outperforms other state-of-art approaches. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Showing 1–50 of 226 results for author: Yi, H