Search | arXiv e-print repository

Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI reconstruction time while maintaining the reconstruction quality. Methods: Patients who underwent free-breathing liver 4D MRI were included in the study. Fully- and retrospectively under-sampled data at 3, 6 and 10 times (3x, 6x and 10x) were first reconstructed using the nuFFT algorithm. Re-Con-GAN then trained input and output in pairs. Three types of networks, ResNet9, UNet and reconstruction swin transformer, were explored as generators. PatchGAN was selected as the discriminator. Re-Con-GAN processed the data (3D+t) as temporal slices (2D+t). A total of 48 patients with 12332 temporal slices were split into training (37 patients with 10721 slices) and test (11 patients with 1611 slices). Results: Re-Con-GAN consistently achieved comparable/better PSNR, SSIM, and RMSE scores compared to CS/UNet models. The inference time of Re-Con-GAN, UNet and CS are 0.15s, 0.16s, and 120s. The GTV detection task showed that Re-Con-GAN and CS, compared to UNet, better improved the dice score (3x Re-Con-GAN 80.98%; 3x CS 80.74%; 3x UNet 79.88%) of unprocessed under-sampled images (3x 69.61%). Conclusion: A generative network with adversarial training is proposed with promising and efficient reconstruction results demonstrated on an in-house dataset. The rapid and qualitative reconstruction of 4D liver MR has the potential to facilitate online adaptive MR-guided radiotherapy for liver cancer. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2403.15530 [pdf, other]

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

Authors: Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao

Abstract: 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the avera… ▽ More 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2402.17483 [pdf, other]

AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Authors: Tao Tang, Guangrun Wang, Yixing Lao, Peng Chen, Jie Liu, Liang Lin, Kaicheng Yu, Xiaodan Liang

Abstract: Neural implicit fields have been a de facto standard in novel view synthesis. Recently, there exist some methods exploring fusing multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance. However, these modalities often exhibit misaligned behaviors: optimizing for one modality, such as LiDAR, can adversely affect a… ▽ More Neural implicit fields have been a de facto standard in novel view synthesis. Recently, there exist some methods exploring fusing multiple modalities within a single field, aiming to share implicit features from different modalities to enhance reconstruction performance. However, these modalities often exhibit misaligned behaviors: optimizing for one modality, such as LiDAR, can adversely affect another, like camera performance, and vice versa. In this work, we conduct comprehensive analyses on the multimodal implicit field of LiDAR-camera joint synthesis, revealing the underlying issue lies in the misalignment of different sensors. Furthermore, we introduce AlignMiF, a geometrically aligned multimodal implicit field with two proposed modules: Geometry-Aware Alignment (GAA) and Shared Geometry Initialization (SGI). These modules effectively align the coarse geometry across different modalities, significantly enhancing the fusion process between LiDAR and camera data. Through extensive experiments across various datasets and scenes, we demonstrate the effectiveness of our approach in facilitating better interaction between LiDAR and camera modalities within a unified neural field. Specifically, our proposed AlignMiF, achieves remarkable improvement over recent implicit fusion methods (+2.01 and +3.11 image PSNR on the KITTI-360 and Waymo datasets) and consistently surpasses single modality performance (13.8% and 14.2% reduction in LiDAR Chamfer Distance on the respective datasets). △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: CVPR2024

arXiv:2402.04554 [pdf, other]

BirdNeRF: Fast Neural Reconstruction of Large-Scale Scenes From Aerial Imagery

Authors: Huiqing Zhang, Yifei Xue, Ming Liao, Yizhen Lao

Abstract: In this study, we introduce BirdNeRF, an adaptation of Neural Radiance Fields (NeRF) designed specifically for reconstructing large-scale scenes using aerial imagery. Unlike previous research focused on small-scale and object-centric NeRF reconstruction, our approach addresses multiple challenges, including (1) Addressing the issue of slow training and rendering associated with large models. (2) M… ▽ More In this study, we introduce BirdNeRF, an adaptation of Neural Radiance Fields (NeRF) designed specifically for reconstructing large-scale scenes using aerial imagery. Unlike previous research focused on small-scale and object-centric NeRF reconstruction, our approach addresses multiple challenges, including (1) Addressing the issue of slow training and rendering associated with large models. (2) Meeting the computational demands necessitated by modeling a substantial number of images, requiring extensive resources such as high-performance GPUs. (3) Overcoming significant artifacts and low visual fidelity commonly observed in large-scale reconstruction tasks due to limited model capacity. Specifically, we present a novel bird-view pose-based spatial decomposition algorithm that decomposes a large aerial image set into multiple small sets with appropriately sized overlaps, allowing us to train individual NeRFs of sub-scene. This decomposition approach not only decouples rendering time from the scene size but also enables rendering to scale seamlessly to arbitrarily large environments. Moreover, it allows for per-block updates of the environment, enhancing the flexibility and adaptability of the reconstruction process. Additionally, we propose a projection-guided novel view re-rendering strategy, which aids in effectively utilizing the independently trained sub-scenes to generate superior rendering results. We evaluate our approach on existing datasets as well as against our own drone footage, improving reconstruction speed by 10x over classical photogrammetry software and 50x over state-of-the-art large-scale NeRF solution, on a single GPU with similar rendering quality. △ Less

Submitted 11 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.09126 [pdf, other]

Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

Authors: Benjamin Ummenhofer, Sanskar Agrawal, Rene Sepulveda, Yixing Lao, Kai Zhang, Tianhang Cheng, Stephan Richter, Shenlong Wang, German Ros

Abstract: Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This w… ▽ More Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This work presents a real-world dataset for measuring the reconstruction and rendering of objects for relighting. To this end, we capture the environment lighting and ground truth images of the same objects in multiple environments allowing to reconstruct the objects from images taken in one environment and quantify the quality of the rendered views for the unseen lighting environments. Further, we introduce a simple baseline composed of off-the-shelf methods and test several state-of-the-art methods on the relighting task and show that novel view synthesis is not a reliable proxy to measure performance. Code and dataset are available at https://github.com/isl-org/objects-with-lighting . △ Less

Submitted 13 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted at 3DV 2024, Oral presentation. For the project page see https://github.com/isl-org/objects-with-lighting

arXiv:2401.03844 [pdf, other]

Fully Attentional Networks with Self-emerging Token Labeling

Authors: Bingyin Zhao, Zhiding Yu, Shiyi Lan, Yutao Cheng, Anima Anandkumar, Yingjie Lao, Jose M. Alvarez

Abstract: Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framew… ▽ More Recent studies indicate that Vision Transformers (ViTs) are robust against out-of-distribution scenarios. In particular, the Fully Attentional Network (FAN) - a family of ViT backbones, has achieved state-of-the-art robustness. In this paper, we revisit the FAN models and improve their pre-training with a self-emerging token labeling (STL) framework. Our method contains a two-stage training framework. Specifically, we first train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label. With the proposed STL framework, our best model based on FAN-L-Hybrid (77.3M parameters) achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data, outperforming the original FAN counterpart by significant margins. The proposed framework also demonstrates significantly enhanced performance on downstream tasks such as semantic segmentation, with up to 1.7% improvement in robustness over the counterpart model. Code is available at https://github.com/NVlabs/STL. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 5585-5595

arXiv:2312.10657 [pdf, other]

UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks

Authors: Bingyin Zhao, Yingjie Lao

Abstract: Backdoor attacks are emerging threats to deep neural networks, which typically embed malicious behaviors into a victim model by injecting poisoned samples. Adversaries can activate the injected backdoor during inference by presenting the trigger on input images. Prior defensive methods have achieved remarkable success in countering dirty-label backdoor attacks where the labels of poisoned samples… ▽ More Backdoor attacks are emerging threats to deep neural networks, which typically embed malicious behaviors into a victim model by injecting poisoned samples. Adversaries can activate the injected backdoor during inference by presenting the trigger on input images. Prior defensive methods have achieved remarkable success in countering dirty-label backdoor attacks where the labels of poisoned samples are often mislabeled. However, these approaches do not work for a recent new type of backdoor -- clean-label backdoor attacks that imperceptibly modify poisoned data and hold consistent labels. More complex and powerful algorithms are demanded to defend against such stealthy attacks. In this paper, we propose UltraClean, a general framework that simplifies the identification of poisoned samples and defends against both dirty-label and clean-label backdoor attacks. Given the fact that backdoor triggers introduce adversarial noise that intensifies in feed-forward propagation, UltraClean first generates two variants of training samples using off-the-shelf denoising functions. It then measures the susceptibility of training samples leveraging the error amplification effect in DNNs, which dilates the noise difference between the original image and denoised variants. Lastly, it filters out poisoned samples based on the susceptibility to thwart the backdoor implantation. Despite its simplicity, UltraClean achieves a superior detection rate across various datasets and significantly reduces the backdoor attack success rate while maintaining a decent model accuracy on clean data, outperforming existing defensive methods by a large margin. Code is available at https://github.com/bxz9200/UltraClean. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.06642 [pdf, other]

CorresNeRF: Image Correspondence Priors for Neural Radiance Fields

Authors: Yixing Lao, Xiaogang Xu, Zhipeng Cai, Xihui Liu, Hengshuang Zhao

Abstract: Neural Radiance Fields (NeRFs) have achieved impressive results in novel view synthesis and surface reconstruction tasks. However, their performance suffers under challenging scenarios with sparse input views. We present CorresNeRF, a novel method that leverages image correspondence priors computed by off-the-shelf methods to supervise NeRF training. We design adaptive processes for augmentation a… ▽ More Neural Radiance Fields (NeRFs) have achieved impressive results in novel view synthesis and surface reconstruction tasks. However, their performance suffers under challenging scenarios with sparse input views. We present CorresNeRF, a novel method that leverages image correspondence priors computed by off-the-shelf methods to supervise NeRF training. We design adaptive processes for augmentation and filtering to generate dense and high-quality correspondences. The correspondences are then used to regularize NeRF training via the correspondence pixel reprojection and depth loss terms. We evaluate our methods on novel view synthesis and surface reconstruction tasks with density-based and SDF-based NeRF models on different datasets. Our method outperforms previous methods in both photometric and geometric metrics. We show that this simple yet effective technique of using correspondence priors can be applied as a plug-and-play module across different NeRF variants. The project page is at https://yxlao.github.io/corres-nerf. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Journal ref: NeurIPS 2023

arXiv:2310.04618 [pdf, other]

doi 10.1109/ICCAD57390.2023.10323839

KyberMat: Efficient Accelerator for Matrix-Vector Polynomial Multiplication in CRYSTALS-Kyber Scheme via NTT and Polyphase Decomposition

Authors: Weihang Tan, Yingjie Lao, Keshab K. Parhi

Abstract: CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations… ▽ More CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations and bottlenecks that require optimization. To address this challenge, we propose an algorithm and hardware co-design approach to systematically optimize matrix-vector multiplication and NTT-based polynomial multiplication by employing a novel sub-structure sharing technique in order to reduce computational complexity, i.e., the number of modular multiplications and modular additions/subtractions consumed. The sub-structure sharing approach is inspired by prior fast parallel approaches based on polyphase decomposition. The proposed efficient feed-forward architecture achieves high speed, low latency, and full utilization of all hardware components, which can significantly enhance the overall efficiency of the Kyber scheme. The FPGA implementation results show that our proposed design, using the fast two-parallel structure, leads to an approximate reduction of 90% in execution time, along with a 66 times improvement in throughput performance. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: Proc. 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, Oct. 29 - Nov. 2, 2023

Journal ref: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)

arXiv:2310.00567 [pdf, other]

Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks

Authors: Quang H. Nguyen, Yingjie Lao, Tung Pham, Kok-Seng Wong, Khoa D. Doan

Abstract: Recent works have shown that deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify. Even with access only to the model's output, an attacker can employ black-box attacks to generate such adversarial examples. In this work, we propose a simple and lightweight defense against black-box attacks by adding random noi… ▽ More Recent works have shown that deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify. Even with access only to the model's output, an attacker can employ black-box attacks to generate such adversarial examples. In this work, we propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time. Our theoretical analysis confirms that this method effectively enhances the model's resilience against both score-based and decision-based black-box attacks. Importantly, our defense does not necessitate adversarial training and has minimal impact on accuracy, rendering it applicable to any pre-trained model. Our analysis also reveals the significance of selectively adding noise to different parts of the model based on the gradient of the adversarial objective function, which can be varied during the attack. We demonstrate the robustness of our defense against multiple black-box attacks through extensive empirical experiments involving diverse models with various architectures. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2307.05129 [pdf, other]

DFR: Depth from Rotation by Uncalibrated Image Rectification with Latitudinal Motion Assumption

Authors: Yongcong Zhang, Yifei Xue, Ming Liao, Huiqing Zhang, Yizhen Lao

Abstract: Despite the increasing prevalence of rotating-style capture (e.g., surveillance cameras), conventional stereo rectification techniques frequently fail due to the rotation-dominant motion and small baseline between views. In this paper, we tackle the challenge of performing stereo rectification for uncalibrated rotating cameras. To that end, we propose Depth-from-Rotation (DfR), a novel image recti… ▽ More Despite the increasing prevalence of rotating-style capture (e.g., surveillance cameras), conventional stereo rectification techniques frequently fail due to the rotation-dominant motion and small baseline between views. In this paper, we tackle the challenge of performing stereo rectification for uncalibrated rotating cameras. To that end, we propose Depth-from-Rotation (DfR), a novel image rectification solution that analytically rectifies two images with two-point correspondences and serves for further depth estimation. Specifically, we model the motion of a rotating camera as the camera rotates on a sphere with fixed latitude. The camera's optical axis lies perpendicular to the sphere's surface. We call this latitudinal motion assumption. Then we derive a 2-point analytical solver from directly computing the rectified transformations on the two images. We also present a self-adaptive strategy to reduce the geometric distortion after rectification. Extensive synthetic and real data experiments demonstrate that the proposed method outperforms existing works in effectiveness and efficiency by a significant margin. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2307.03811 [pdf]

doi 10.1021/acs.jcim.3c01030

Formulation Graphs for Map** Structure-Composition of Battery Electrolytes to Device Performance

Authors: Vidushi Sharma, Maxwell Giammona, Dmitry Zubarev, Andy Tek, Khanh Nugyuen, Linda Sundberg, Daniele Congiu, Young-Hye La

Abstract: Advanced computational methods are being actively sought for addressing the challenges associated with discovery and development of new combinatorial material such as formulations. A widely adopted approach involves domain informed high-throughput screening of individual components that can be combined into a formulation. This manages to accelerate the discovery of new compounds for a target appli… ▽ More Advanced computational methods are being actively sought for addressing the challenges associated with discovery and development of new combinatorial material such as formulations. A widely adopted approach involves domain informed high-throughput screening of individual components that can be combined into a formulation. This manages to accelerate the discovery of new compounds for a target application but still leave the process of identifying the right 'formulation' from the shortlisted chemical space largely a laboratory experiment-driven process. We report a deep learning model, Formulation Graph Convolution Network (F-GCN), that can map structure-composition relationship of the individual components to the property of liquid formulation as whole. Multiple GCNs are assembled in parallel that featurize formulation constituents domain-intuitively on the fly. The resulting molecular descriptors are scaled based on respective constituent's molar percentage in the formulation, followed by formalizing into a combined descriptor that represents a complete formulation to an external learning architecture. The use case of proposed formulation learning model is demonstrated for battery electrolytes by training and testing it on two exemplary datasets representing electrolyte formulations vs battery performance -- one dataset is sourced from literature about Li/Cu half-cells, while the other is obtained by lab-experiments related to lithium-iodide full-cell chemistry. The model is shown to predict the performance metrics like Coulombic Efficiency (CE) and specific capacity of new electrolyte formulations with lowest reported errors. The best performing F-GCN model uses molecular descriptors derived from molecular graphs that are informed with HOMO-LUMO and electric moment properties of the molecules using a knowledge transfer technique. △ Less

Submitted 28 September, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: 35 pages, 10 figures

arXiv:2304.10406 [pdf, other]

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

Authors: Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

Abstract: We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address thi… ▽ More We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address this challenge by formulating, to the best of our knowledge, the first differentiable end-to-end LiDAR rendering framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points. However, simply employing NeRF cannot achieve satisfactory results, as it only focuses on learning individual pixels while ignoring local information, especially at low texture areas, resulting in poor geometry. To this end, we have taken steps to address this issue by introducing a structural regularization method to preserve local structural details. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR-NeRF surpasses the model-based algorithms significantly. △ Less

Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: This paper introduces a new task of novel LiDAR view synthesis, and proposes a differentiable framework called LiDAR-NeRF with a structural regularization, as well as an object-centric multi-view LiDAR dataset called NeRF-MVL

arXiv:2304.09783 [pdf, other]

Application of attention-based Siamese composite neural network in medical image recognition

Authors: Zihao Huang, Yue Wang, Weixing Xin, Xingtong Lin, Huizhen Li, Haowen Chen, Yizhen Lao, Xia Chen

Abstract: Medical image recognition often faces the problem of insufficient data in practical applications. Image recognition and processing under few-shot conditions will produce overfitting, low recognition accuracy, low reliability and insufficient robustness. It is often the case that the difference of characteristics is subtle, and the recognition is affected by perspectives, background, occlusion and… ▽ More Medical image recognition often faces the problem of insufficient data in practical applications. Image recognition and processing under few-shot conditions will produce overfitting, low recognition accuracy, low reliability and insufficient robustness. It is often the case that the difference of characteristics is subtle, and the recognition is affected by perspectives, background, occlusion and other factors, which increases the difficulty of recognition. Furthermore, in fine-grained images, the few-shot problem leads to insufficient useful feature information in the images. Considering the characteristics of few-shot and fine-grained image recognition, this study has established a recognition model based on attention and Siamese neural network. Aiming at the problem of few-shot samples, a Siamese neural network suitable for classification model is proposed. The Attention-Based neural network is used as the main network to improve the classification effect. Covid- 19 lung samples have been selected for testing the model. The results show that the less the number of image samples are, the more obvious the advantage shows than the ordinary neural network. △ Less

Submitted 15 March, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.18125 [pdf, other]

Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction

Authors: Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

Abstract: This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer from two main drawbacks. Firstly, they face challenges in estimating the accurate correction field due to the uniform velocity assumption, leading to significant image correction errors under complex motion. Secondly, the drastic occlusion in dynami… ▽ More This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion. Existing methods suffer from two main drawbacks. Firstly, they face challenges in estimating the accurate correction field due to the uniform velocity assumption, leading to significant image correction errors under complex motion. Secondly, the drastic occlusion in dynamic scenes prevents current solutions from achieving better image quality because of the inherent difficulties in aligning and aggregating multiple frames. To tackle these challenges, we model the curvilinear trajectory of pixels analytically and propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels. Besides, to reconstruct high-quality occlusion frames in dynamic scenes, we present a 3D video architecture that effectively Aligns and Aggregates multi-frame context, namely, RSA2-Net. We evaluate our method across a broad range of cameras and video sequences, demonstrating its significant superiority. Specifically, our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively. Code is available at https://github.com/DelinQu/qrsc. △ Less

Submitted 15 August, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: accepted at ICCV 2023

arXiv:2303.02237 [pdf, other]

doi 10.1109/TIFS.2023.3338553

PaReNTT: Low-Latency Parallel Residue Number System and NTT-Based Long Polynomial Modular Multiplication for Homomorphic Encryption

Authors: Weihang Tan, Sin-Wei Chiu, Antian Wang, Yingjie Lao, Keshab K. Parhi

Abstract: High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our… ▽ More High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes four novel contributions. First, parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. This can enable real-time processing for HE applications, as the number of clock cycles to process the polynomial is inversely proportional to the level of parallelism. Second, the proposed architecture eliminates the need for permuting the NTT outputs before their product is input to the iNTT. This reduces latency by n/4 clock cycles, where n is the length of the polynomial, and reduces buffer requirement by one delay-switch-delay circuit of size n. Third, an approach to select special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Fourth, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate as these feed-forward architectures can be pipelined at arbitrary levels. △ Less

Submitted 6 July, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Journal ref: IEEE Transactions on Information Forensics and Security, Vol. 19, pp. 1646-1659, 2024

arXiv:2302.08505 [pdf, other]

Rapid-Motion-Track: Markerless Tracking of Fast Human Motion with Deeper Learning

Authors: Renjie Li, Chun Yu Lao, Rebecca St. George, Katherine Lawler, Saurabh Garg, Son N. Tran, Quan Bai, Jane Alty

Abstract: Objective The coordination of human movement directly reflects function of the central nervous system. Small deficits in movement are often the first sign of an underlying neurological problem. The objective of this research is to develop a new end-to-end, deep learning-based system, Rapid-Motion-Track (RMT) that can track the fastest human movement accurately when webcams or laptop cameras are us… ▽ More Objective The coordination of human movement directly reflects function of the central nervous system. Small deficits in movement are often the first sign of an underlying neurological problem. The objective of this research is to develop a new end-to-end, deep learning-based system, Rapid-Motion-Track (RMT) that can track the fastest human movement accurately when webcams or laptop cameras are used. Materials and Methods We applied RMT to finger tap**, a well-validated test of motor control that is one of the most challenging human motions to track with computer vision due to the small keypoints of digits and the high velocities that are generated. We recorded 160 finger tap** assessments simultaneously with a standard 2D laptop camera (30 frames/sec) and a high-speed wearable sensor-based 3D motion tracking system (250 frames/sec). RMT and a range of DLC models were applied to the video data with tap** frequencies up to 8Hz to extract movement features. Results The movement features (e.g. speed, rhythm, variance) identified with the new RMT system exhibited very high concurrent validity with the gold-standard measurements (97.3\% of RMT measures were within +/-0.5Hz of the Optotrak measures), and outperformed DLC and other advanced computer vision tools (around 88.2\% of DLC measures were within +/-0.5Hz of the Optotrak measures). RMT also accurately tracked a range of other rapid human movements such as foot tap**, head turning and sit-to -stand movements. Conclusion: With the ubiquity of video technology in smart devices, the RMT method holds potential to transform access and accuracy of human movement assessment. △ Less

Submitted 18 January, 2023; originally announced February 2023.

arXiv:2210.09194 [pdf, other]

Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

Authors: Khoa D. Doan, Yingjie Lao, ** Li

Abstract: In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under such attacks, an adversary embeds a stealthy backdoor into the trained model such that the compromised models will behave normally on clean inputs but will misclassify according to the adversary's control on maliciously constructed input with a trigger. While these existing attacks are very effecti… ▽ More In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under such attacks, an adversary embeds a stealthy backdoor into the trained model such that the compromised models will behave normally on clean inputs but will misclassify according to the adversary's control on maliciously constructed input with a trigger. While these existing attacks are very effective, the adversary's capability is limited: given an input, these attacks can only cause the model to misclassify toward a single pre-defined or target class. In contrast, this paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman, where the adversary can arbitrarily choose which target class the model will misclassify given any input during inference. To achieve this goal, we propose to represent the trigger function as a class-conditional generative model and to inject the backdoor in a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target class at will while simultaneously embedding this generative backdoor into the trained model. Given the learned trigger-generation function, during inference, the adversary can specify an arbitrary backdoor attack target class, and an appropriate trigger causing the model to classify toward this target class is created accordingly. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImageNet. The proposed Marksman backdoor attack can also easily bypass existing backdoor defenses that were originally designed against backdoor attacks with a single target class. Our work takes another significant step toward understanding the extensive risks of backdoor attacks in practice. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2210.05666 [pdf, other]

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

Authors: Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

Abstract: As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we firs… ▽ More As a pioneering work exploring transformer architecture for 3D point cloud understanding, Point Transformer achieves impressive results on multiple highly competitive benchmarks. In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work. In particular, we first propose group vector attention, which is more effective than the previous version of vector attention. Inheriting the advantages of both learnable weight encoding and multi-head attention, we present a highly effective implementation of grouped vector attention with a novel grouped weight encoding layer. We also strengthen the position information for attention by an additional position encoding multiplier. Furthermore, we design novel and lightweight partition-based pooling methods which enable better spatial alignment and more efficient sampling. Extensive experiments show that our model achieves better performance than its predecessor and achieves state-of-the-art on several challenging 3D point cloud understanding benchmarks, including 3D point cloud segmentation on ScanNet v2 and S3DIS and 3D point cloud classification on ModelNet40. Our code will be available at https://github.com/Gofinge/PointTransformerV2. △ Less

Submitted 12 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.08503 [pdf, other]

Revisiting Rolling Shutter Bundle Adjustment: Toward Accurate and Fast Solution

Authors: Bangyan Liao, Delin Qu, Yifei Xue, Huiqing Zhang, Yizhen Lao

Abstract: We propose a robust and fast bundle adjustment solution that estimates the 6-DoF pose of the camera and the geometry of the environment based on measurements from a rolling shutter (RS) camera. This tackles the challenges in the existing works, namely relying on additional sensors, high frame rate video as input, restrictive assumptions on camera motion, readout direction, and poor efficiency. To… ▽ More We propose a robust and fast bundle adjustment solution that estimates the 6-DoF pose of the camera and the geometry of the environment based on measurements from a rolling shutter (RS) camera. This tackles the challenges in the existing works, namely relying on additional sensors, high frame rate video as input, restrictive assumptions on camera motion, readout direction, and poor efficiency. To this end, we first investigate the influence of normalization to the image point on RSBA performance and show its better approximation in modelling the real 6-DoF camera motion. Then we present a novel analytical model for the visual residual covariance, which can be used to standardize the reprojection error during the optimization, consequently improving the overall accuracy. More importantly, the combination of normalization and covariance standardization weighting in RSBA (NW-RSBA) can avoid common planar degeneracy without needing to constrain the filming manner. Besides, we propose an acceleration strategy for NW-RSBA based on the sparsity of its Jacobian matrix and Schur complement. The extensive synthetic and real data experiments verify the effectiveness and efficiency of the proposed solution over the state-of-the-art works. We also demonstrate the proposed method can be easily implemented and plug-in famous GSSfM and GSSLAM systems as completed RSSfM and RSSLAM solutions. △ Less

Submitted 18 April, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

Comments: Accepted to CVPR 2023

arXiv:2208.14270 [pdf, other]

Integral Sampler and Polynomial Multiplication Architecture for Lattice-based Cryptography

Authors: Antian Wang, Weihang Tan, Keshab K. Parhi, Yingjie Lao

Abstract: With the surge of the powerful quantum computer, lattice-based cryptography proliferated the latest cryptography hardware implementation due to its resistance against quantum computers. Among the computational blocks of lattice-based cryptography, the random errors produced by the sampler play a key role in ensuring the security of these schemes. This paper proposes an integral architecture for th… ▽ More With the surge of the powerful quantum computer, lattice-based cryptography proliferated the latest cryptography hardware implementation due to its resistance against quantum computers. Among the computational blocks of lattice-based cryptography, the random errors produced by the sampler play a key role in ensuring the security of these schemes. This paper proposes an integral architecture for the sampler, which can reduce the overall resource consumption by reusing the multipliers and adders within the modular polynomial computation. For instance, our experimental results show that the proposed design can effectively reduce the discrete Ziggurat sampling method in DSP usage. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 6 pages, accepted by 35th IEEE Int. Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems

arXiv:2208.13361 [pdf, other]

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

Authors: Faysal Hossain Shezan, Yingjie Lao, Minlong Peng, Xin Wang, Mingming Sun, ** Li

Abstract: The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in… ▽ More The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in this matter or with limited resources. To address these hurdles, we develop an automatic tool, NL2GDPR, which can generate policies from natural language descriptions from the developer while also ensuring the app's functionalities are compliant with General Data Protection Regulation (GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA (Open Information Annotation), developed by Baidu Cognitive Computing Lab. At the core, NL2GDPR is a privacy-centric information extraction model, appended with a GDPR policy finder and a policy generator. We perform a comprehensive study to grasp the challenges in extracting privacy-centric information and generating privacy policies, while exploiting optimizations for this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4% accuracy in correctly identifying GDPR policies related to personal data storage, process, and share types, respectively. To the best of our knowledge, NL2GDPR is the first tool that allows a developer to automatically generate GDPR compliant policies, with only the need of entering the natural language for describing the app features. Note that other non-GDPR-related features might be integrated with the generated features to build a complex app. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: 37 pages

arXiv:2208.07678 [pdf, other]

FEC: Fast Euclidean Clustering for Point Cloud Segmentation

Authors: Yu Cao, Yancheng Wang, Yifei Xue, Huiqing Zhang, Yizhen Lao

Abstract: Segmentation from point cloud data is essential in many applications such as remote sensing, mobile robots, or autonomous cars. However, the point clouds captured by the 3D range sensor are commonly sparse and unstructured, challenging efficient segmentation. In this paper, we present a fast solution to point cloud instance segmentation with small computational demands. To this end, we propose a n… ▽ More Segmentation from point cloud data is essential in many applications such as remote sensing, mobile robots, or autonomous cars. However, the point clouds captured by the 3D range sensor are commonly sparse and unstructured, challenging efficient segmentation. In this paper, we present a fast solution to point cloud instance segmentation with small computational demands. To this end, we propose a novel fast Euclidean clustering (FEC) algorithm which applies a pointwise scheme over the clusterwise scheme used in existing works. Our approach is conceptually simple, easy to implement (40 lines in C++), and achieves two orders of magnitudes faster against the classical segmentation methods while producing high-quality results. △ Less

Submitted 11 November, 2022; v1 submitted 16 August, 2022; originally announced August 2022.

arXiv:2206.12381 [pdf, other]

Defending Backdoor Attacks on Vision Transformer via Patch Processing

Authors: Khoa D. Doan, Yingjie Lao, Peng Yang, ** Li

Abstract: Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative caus… ▽ More Vision Transformers (ViTs) have a radically different architecture with significantly less inductive bias than Convolutional Neural Networks. Along with the improvement in performance, security and robustness of ViTs are also of great importance to study. In contrast to many recent works that exploit the robustness of ViTs against adversarial examples, this paper investigates a representative causative attack, i.e., backdoor. We first examine the vulnerability of ViTs against various backdoor attacks and find that ViTs are also quite vulnerable to existing attacks. However, we observe that the clean-data accuracy and backdoor attack success rate of ViTs respond distinctively to patch transformations before the positional encoding. Then, based on this finding, we propose an effective method for ViTs to defend both patch-based and blending-based trigger backdoor attacks via patch processing. The performances are evaluated on several benchmark datasets, including CIFAR10, GTSRB, and TinyImageNet, which show the proposed novel defense is very successful in mitigating backdoor attacks for ViTs. To the best of our knowledge, this paper presents the first defensive strategy that utilizes a unique characteristic of ViTs against backdoor attacks. The paper will appear in the Proceedings of the AAAI'23 Conference. This work was initially submitted in November 2021 to CVPR'22, then it was re-submitted to ECCV'22. The paper was made public in June 2022. The authors sincerely thank all the referees from the Program Committees of CVPR'22, ECCV'22, and AAAI'23. △ Less

Submitted 16 January, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

arXiv:2205.15444 [pdf, other]

Integrity Authentication in Tree Models

Authors: Weijie Zhao, Yingjie Lao, ** Li

Abstract: Tree models are very widely used in practice of machine learning and data mining. In this paper, we study the problem of model integrity authentication in tree models. In general, the task of model integrity authentication is the design \& implementation of mechanisms for checking/detecting whether the model deployed for the end-users has been tampered with or compromised, e.g., malicious modifica… ▽ More Tree models are very widely used in practice of machine learning and data mining. In this paper, we study the problem of model integrity authentication in tree models. In general, the task of model integrity authentication is the design \& implementation of mechanisms for checking/detecting whether the model deployed for the end-users has been tampered with or compromised, e.g., malicious modifications on the model. We propose an authentication framework that enables the model builders/distributors to embed a signature to the tree model and authenticate the existence of the signature by only making a small number of black-box queries to the model. To the best of our knowledge, this is the first study of signature embedding on tree models. Our proposed method simply locates a collection of leaves and modifies their prediction values, which does not require any training/testing data nor any re-training. The experiments on a large number of public classification datasets confirm that the proposed signature embedding process has a high success rate while only introducing a minimal prediction accuracy loss. △ Less

Submitted 23 June, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2202.11010 [pdf]

doi 10.1038/s41567-022-01555-6

Entropy-driven order in an array of nanomagnets

Authors: Hilal Saglam, Ayhan Duzgun, Aikaterini Kargioti, Nikhil Harle, Xiaoyu Zhang, Nicholas S. Bingham, Yuyang Lao, Ian Gilbert, Joseph Sklenar, Justin D. Watts, Justin Ramberger, Daniel Bromley, Rajesh V. Chopdekar, Liam O'Brien, Chris Leighton, Cristiano Nisoli, Peter Schiffer

Abstract: Long-range ordering is typically associated with a decrease in entropy. Yet, it can also be driven by increasing entropy in certain special cases. We demonstrate that artificial spin ice arrays of single-domain nanomagnets can be designed to produce entropy-driven order. We focus on the tetris artificial spin ice structure, a highly frustrated array geometry with a zero-point Pauli entropy, which… ▽ More Long-range ordering is typically associated with a decrease in entropy. Yet, it can also be driven by increasing entropy in certain special cases. We demonstrate that artificial spin ice arrays of single-domain nanomagnets can be designed to produce entropy-driven order. We focus on the tetris artificial spin ice structure, a highly frustrated array geometry with a zero-point Pauli entropy, which is formed by selectively creating regular vacancies on the canonical square ice lattice. We probe thermally active tetris artificial spin ice both experimentally and through simulations, measuring the magnetic moments of the individual nanomagnets. We find two-dimensional magnetic ordering in one subset of these moments, which we demonstrate to be induced by disorder (i.e., increased entropy) in another subset of the moments. In contrast with other entropy-driven systems, the discrete degrees of freedom in tetris artificial spin ice are binary and are both designable and directly observable at the microscale, and the entropy of the system is precisely calculable in simulations. This example, in which the system's interactions and ground state entropy are well-defined, expands the experimental landscape for the study of entropy-driven ordering. △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: 53 pages, 19 figures (including Supplementary Information)

arXiv:2110.12127 [pdf, other]

doi 10.1109/TC.2023.3251847

High-Speed VLSI Architectures for Modular Polynomial Multiplication via Fast Filtering and Applications to Lattice-Based Cryptography

Authors: Weihang Tan, Antian Wang, Yingjie Lao, Xinmiao Zhang, Keshab K. Parhi

Abstract: This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication. We also exten… ▽ More This paper presents a low-latency hardware accelerator for modular polynomial multiplication for lattice-based post-quantum cryptography and homomorphic encryption applications. The proposed novel modular polynomial multiplier exploits the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication. We also extend this structure to fast $M$-parallel architectures while achieving low-latency, high-speed, and full hardware utilization. We comprehensively evaluate the performance of the proposed architectures under various polynomial settings as well as in the Saber scheme for post-quantum cryptography as a case study. The experimental results show that our proposed modular polynomial multiplier reduces the computation time and area-time product, respectively, compared to the state-of-the-art designs. △ Less

Submitted 24 February, 2023; v1 submitted 22 October, 2021; originally announced October 2021.

Journal ref: IEEE Trans. on Computers, 72(9), pp. 2454-2466, Sept. 2023

arXiv:2110.00511 [pdf, other]

ASH: A Modern Framework for Parallel Spatial Hashing in 3D Perception

Authors: Wei Dong, Yixing Lao, Michael Kaess, Vladlen Koltun

Abstract: We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU. Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction. Unlike exist… ▽ More We present ASH, a modern and high-performance framework for parallel spatial hashing on GPU. Compared to existing GPU hash map implementations, ASH achieves higher performance, supports richer functionality, and requires fewer lines of code (LoC) when used for implementing spatially varying operations from volumetric geometry reconstruction to differentiable appearance reconstruction. Unlike existing GPU hash maps, the ASH framework provides a versatile tensor interface, hiding low-level details from the users. In addition, by decoupling the internal hashing data structures and key-value data in buffers, we offer direct access to spatially varying data via indices, enabling seamless integration to modern libraries such as PyTorch. To achieve this, we 1) detach stored key-value data from the low-level hash map implementation; 2) bridge the pointer-first low level data structures to index-first high-level tensor interfaces via an index heap; 3) adapt both generic and non-generic integer-only hash map implementations as backends to operate on multi-dimensional keys. We first profile our hash map against state-of-the-art hash maps on synthetic data to show the performance gain from this architecture. We then show that ASH can consistently achieve higher performance on various large-scale 3D perception tasks with fewer LoC by showcasing several applications, including 1) point cloud voxelization, 2) retargetable volumetric scene reconstruction, 3) non-rigid point cloud registration and volumetric deformation, and 4) spatially varying geometry and appearance refinement. ASH and its example applications are open sourced in Open3D (http://www.open3d.org). △ Less

Submitted 29 January, 2023; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: 18 pages, 19 figures

arXiv:2010.16325 [pdf]

Identifying the optimal parameters for sprayed and inhaled drug particulates for intranasal targeting of SARS-CoV-2 infection sites

Authors: Yueying Lao, Diane Joseph-McCarthy, Arijit Chakravarty, Pallavi A. Balivada, Phoebe Ato, Nogaye K. Ka, Saikat Basu

Abstract: Efficacy for COVID-19 treatments can be enhanced significantly through targeting the nasopharynx, which has been shown to be the dominant preliminary infection site for SARS-CoV-2. Although intranasal drugs can be administered easily through drops or sprays, it is difficult to test whether current protocols will deliver the right amount of the drug to this location consistently. We are interested… ▽ More Efficacy for COVID-19 treatments can be enhanced significantly through targeting the nasopharynx, which has been shown to be the dominant preliminary infection site for SARS-CoV-2. Although intranasal drugs can be administered easily through drops or sprays, it is difficult to test whether current protocols will deliver the right amount of the drug to this location consistently. We are interested in develo** an in silico prototy** tool to rapidly identify optimal parameters for intranasal delivery. In this study, we have applied computational fluid dynamics to simulate fluid flow through the nasal cavity and examined particle deposition for a drug formulation, mimicking different delivery methods. The nasal geometry models were derived using digitized and meshed computed tomography (CT) scans of human patients. Using the nasal geometries, we simulated two different airflows: a laminar model at 15 LPM (Liters/min) that simulated resting breathing rate and a Large Eddy Simulation (LES) model used to achieve a higher flow rate of 30 LPM. We were able to run particle tracking simulations for these two airflow schemes to test different drug properties such as particle size. The different injection methods used include surface injection which best replicates an inhaler-based release of particle droplets into the nostril and the cone injection method which best replicates a spray into the nostril. The results of the study suggest that the most optimal drug particle size for targeting the intranasal infection sites is around 6-14 microns. △ Less

Submitted 30 October, 2020; originally announced October 2020.

Comments: 27 pages, 16 figures

arXiv:2008.07571 [pdf]

doi 10.1038/s41467-021-26734-6

String Phase in an Artificial Spin Ice

Authors: Xiaoyu Zhang, Ayhan Duzgun, Yuyang Lao, Shayaan Subzwari, Nicholas S. Bingham, Joseph Sklenar, Hilal Saglam, Justin Ramberger, Joseph T. Batley, Justin D. Watts, Daniel Bromley, Rajesh V. Chopdekar, Liam O'Brien, Chris Leighton, Cristiano Nisoli, Peter Schiffer

Abstract: One-dimensional strings of local excitations are a fascinating feature of the physical behavior of strongly correlated topological quantum matter. Here we study strings of local excitations in a classical system of interacting nanomagnets, the Santa Fe Ice geometry of artificial spin ice. We measured the moment configuration of the nanomagnets, both after annealing near the ferromagnetic Curie poi… ▽ More One-dimensional strings of local excitations are a fascinating feature of the physical behavior of strongly correlated topological quantum matter. Here we study strings of local excitations in a classical system of interacting nanomagnets, the Santa Fe Ice geometry of artificial spin ice. We measured the moment configuration of the nanomagnets, both after annealing near the ferromagnetic Curie point and in a thermally dynamic state. While the Santa Fe Ice lattice structure is complex, we demonstrate that its disordered magnetic state is naturally described within a framework of emergent strings. We show experimentally that the string length follows a simple Boltzmann distribution with an energy scale that is associated with the system's magnetic interactions and is consistent with theoretical predictions. The results demonstrate that string description and associated topological characteristics are not unique to quantum models but can also provide a simplifying description of complex classical systems with non-trivial frustration. △ Less

Submitted 20 October, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

Comments: 46 pages including SI

arXiv:2008.00047 [pdf, other]

Towards Class-Oriented Poisoning Attacks Against Neural Networks

Authors: Bingyin Zhao, Yingjie Lao

Abstract: Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset to influence the training process. Prior works focus on either availability attacks (i.e., lowering the overall model accuracy) or integrity attacks (i.e., enabling specific instance-based backdoor). In this paper, we advance the adversarial objectives… ▽ More Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset to influence the training process. Prior works focus on either availability attacks (i.e., lowering the overall model accuracy) or integrity attacks (i.e., enabling specific instance-based backdoor). In this paper, we advance the adversarial objectives of the availability attacks to a per-class basis, which we refer to as class-oriented poisoning attacks. We demonstrate that the proposed attack is capable of forcing the corrupted model to predict in two specific ways: (i) classify unseen new images to a targeted "supplanter" class, and (ii) misclassify images from a "victim" class while maintaining the classification accuracy on other non-victim classes. To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework that crafts poisoning images with carefully manipulated feature information for each scenario. Using newly defined metrics at the class level, we demonstrate the effectiveness of the proposed class-oriented poisoning attacks on various models (e.g., LeNet-5, Vgg-9, and ResNet-50) over a wide range of datasets (e.g., MNIST, CIFAR-10, and ImageNet-ILSVRC2012) in an end-to-end training setting. △ Less

Submitted 11 October, 2021; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: 14 pages, 9 figures, accepted by Winter Conference on Applications of Computer Vision (WACV) 2022

arXiv:1910.14158 [pdf, other]

doi 10.1063/1.5126713

Understanding Thermal Annealing of Artificial Spin Ice

Authors: Xiaoyu Zhang, Yuyang Lao, Joseph Sklenar, Nicholas S. Bingham, Joseph T. Batley, Justin D. Watts, Cristiano Nisoli, Chris Leighton, Peter Schiffer

Abstract: We have performed a detailed study of thermal annealing of the moment configuration in artificial spin ice. Permalloy (Ni$_{80}$Fe$_{20}$) artificial spin ice samples were examined in the prototypical square ice geometry, studying annealing as a function of island thickness, island shape, and annealing temperature and duration. We also measured the Curie temperature as a function of film thickness… ▽ More We have performed a detailed study of thermal annealing of the moment configuration in artificial spin ice. Permalloy (Ni$_{80}$Fe$_{20}$) artificial spin ice samples were examined in the prototypical square ice geometry, studying annealing as a function of island thickness, island shape, and annealing temperature and duration. We also measured the Curie temperature as a function of film thickness, finding that thickness has a strong effect on the Curie temperature in regimes of relevance to many studies of the dynamics of artificial spin ice systems. Increasing the interaction energy between island moments and reducing the energy barrier to flip** the island moments allows the system to more closely approach the collective low energy state of the moments upon annealing, suggesting new channels for understanding the thermalization processes in these important model systems. △ Less

Submitted 30 October, 2019; originally announced October 2019.

Comments: The following article has been accepted by APL Materials. After it is published, it will be found at Link

arXiv:1903.11688 [pdf, other]

Rallying Adversarial Techniques against Deep Learning for Network Security

Authors: Joseph Clements, Yuzhe Yang, Ankur Sharma, Hongxin Hu, Yingjie Lao

Abstract: Recent advances in artificial intelligence and the increasing need for powerful defensive measures in the domain of network security, have led to the adoption of deep learning approaches for use in network intrusion detection systems. These methods have achieved superior performance against conventional network attacks, which enable the deployment of practical security systems to unique and dynami… ▽ More Recent advances in artificial intelligence and the increasing need for powerful defensive measures in the domain of network security, have led to the adoption of deep learning approaches for use in network intrusion detection systems. These methods have achieved superior performance against conventional network attacks, which enable the deployment of practical security systems to unique and dynamic sectors. Adversarial machine learning, unfortunately, has recently shown that deep learning models are inherently vulnerable to adversarial modifications on their input data. Because of this susceptibility, the deep learning models deployed to power a network defense could in fact be the weakest entry point for compromising a network system. In this paper, we show that by modifying on average as little as 1.38 of the input features, an adversary can generate malicious inputs which effectively fool a deep learning based NIDS. Therefore, when designing such systems, it is crucial to consider the performance from not only the conventional network security perspective but also the adversarial machine learning domain. △ Less

Submitted 24 October, 2021; v1 submitted 27 March, 2019; originally announced March 2019.

Comments: accepted by IEEE Symposium Series on Computational Intelligence (IEEE SSCI 2021)

arXiv:1810.10121 [pdf, other]

nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data

Authors: Fabian Boemer, Yixing Lao, Rosario Cammarota, Casimir Wierzynski

Abstract: Homomorphic encryption (HE)---the ability to perform computation on encrypted data---is an attractive remedy to increasing concerns about data privacy in deep learning (DL). However, building DL models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in DL, cryptography, and software engineering. DL frameworks and recent advances in graph compilers have g… ▽ More Homomorphic encryption (HE)---the ability to perform computation on encrypted data---is an attractive remedy to increasing concerns about data privacy in deep learning (DL). However, building DL models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in DL, cryptography, and software engineering. DL frameworks and recent advances in graph compilers have greatly accelerated the training and deployment of DL models to various computing platforms. We introduce nGraph-HE, an extension of nGraph, Intel's DL graph compiler, which enables deployment of trained models with popular frameworks such as TensorFlow while simply treating HE as another hardware target. Our graph-compiler approach enables HE-aware optimizations-- implemented at compile-time, such as constant folding and HE-SIMD packing, and at run-time, such as special value plaintext bypass. Furthermore, nGraph-HE integrates with DL frameworks such as TensorFlow, enabling data scientists to benchmark DL models with minimal overhead. △ Less

Submitted 2 April, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

arXiv:1806.05768 [pdf, other]

Hardware Trojan Attacks on Neural Networks

Authors: Joseph Clements, Yingjie Lao

Abstract: With the rising popularity of machine learning and the ever increasing demand for computational power, there is a growing need for hardware optimized implementations of neural networks and other machine learning models. As the technology evolves, it is also plausible that machine learning or artificial intelligence will soon become consumer electronic products and military equipment, in the form o… ▽ More With the rising popularity of machine learning and the ever increasing demand for computational power, there is a growing need for hardware optimized implementations of neural networks and other machine learning models. As the technology evolves, it is also plausible that machine learning or artificial intelligence will soon become consumer electronic products and military equipment, in the form of well-trained models. Unfortunately, the modern fabless business model of manufacturing hardware, while economic, leads to deficiencies in security through the supply chain. In this paper, we illuminate these security issues by introducing hardware Trojan attacks on neural networks, expanding the current taxonomy of neural network security to incorporate attacks of this nature. To aid in this, we develop a novel framework for inserting malicious hardware Trojans in the implementation of a neural network classifier. We evaluate the capabilities of the adversary in this setting by implementing the attack algorithm on convolutional neural networks while controlling a variety of parameters available to the adversary. Our experimental results show that the proposed algorithm could effectively classify a selected input trigger as a specified class on the MNIST dataset by injecting hardware Trojans into $0.03\%$, on average, of neurons in the 5th hidden layer of arbitrary 7-layer convolutional neural networks, while undetectable under the test data. Finally, we discuss the potential defenses to protect neural networks against hardware Trojan attacks. △ Less

Submitted 14 June, 2018; originally announced June 2018.

arXiv:1804.10911 [pdf, ps, other]

A Tree Search Algorithm for Sequence Labeling

Authors: Yadi Lao, Jun Xu, Yanyan Lan, Jiafeng Guo, Sheng Gao, Xueqi Cheng

Abstract: In this paper we propose a novel reinforcement learning based model for sequence tagging, referred to as MM-Tag. Inspired by the success and methodology of the AlphaGo Zero, MM-Tag formalizes the problem of sequence tagging with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP) model, in which the time steps correspond to the positions of words in a sentence from left to righ… ▽ More In this paper we propose a novel reinforcement learning based model for sequence tagging, referred to as MM-Tag. Inspired by the success and methodology of the AlphaGo Zero, MM-Tag formalizes the problem of sequence tagging with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP) model, in which the time steps correspond to the positions of words in a sentence from left to right, and each action corresponds to assign a tag to a word. Two long short-term memory networks (LSTM) are used to summarize the past tag assignments and words in the sentence. Based on the outputs of LSTMs, the policy for guiding the tag assignment and the value for predicting the whole tagging accuracy of the whole sentence are produced. The policy and value are then strengthened with MCTS, which takes the produced raw policy and value as inputs, simulates and evaluates the possible tag assignments at the subsequent positions, and outputs a better search policy for assigning tags. A reinforcement learning algorithm is proposed to train the model parameters. Our work is the first to apply the MCTS enhanced MDP model to the sequence tagging task. We show that MM-Tag can accurately predict the tags thanks to the exploratory decision making mechanism introduced by MCTS. Experimental results show based on a chunking benchmark showed that MM-Tag outperformed the state-of-the-art sequence tagging baselines including CRF and CRF with LSTM. △ Less

Submitted 17 May, 2018; v1 submitted 29 April, 2018; originally announced April 2018.

arXiv:1801.08058 [pdf, other]

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

Authors: Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb

Abstract: The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the tr… ▽ More The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations). △ Less

Submitted 29 January, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

arXiv:1508.06330 [pdf]

doi 10.1103/PhysRevB.92.104417

Direct Visualization of Memory Effects in Artificial Spin Ice

Authors: Ian Gilbert, Gia-Wei Chern, Bryce Fore, Yuyang Lao, Sheng Zhang, Cristiano Nisoli, Peter Schiffer

Abstract: We experimentally demonstrate that arrays of interacting nanoscale ferromagnetic islands, known as artificial spin ice, develop reproducible microstates upon cycling an applied magnetic field. The onset of this memory effect is determined by the strength of the applied field relative to the array coercivity. Specifically, when the applied field strength is almost exactly equal to the array coerciv… ▽ More We experimentally demonstrate that arrays of interacting nanoscale ferromagnetic islands, known as artificial spin ice, develop reproducible microstates upon cycling an applied magnetic field. The onset of this memory effect is determined by the strength of the applied field relative to the array coercivity. Specifically, when the applied field strength is almost exactly equal to the array coercivity, several training cycles are required before the array achieves a nearly completely repeatable microstate, whereas when the applied field strength is stronger or weaker than the array coercivity, a repeatable microstate is achieved after the first minor loop. We show through experiment and simulation that this memory exhibited by artificial spin ice is due to a ratchet effect on interacting, magnetically-charged defects in the island moment configuration and to the complexity of the network of strings of reversed moments that forms during magnetization reversal. △ Less

Submitted 25 August, 2015; originally announced August 2015.

Comments: 19 pages, 8 figures

arXiv:1410.1780 [pdf, ps, other]

Optimization of a model for the CuO planes in La2CuO4

Authors: Yoandri Vielza de la Cruz, Alejandro Cabo Montes de Oca

Abstract: The results of a previous work, where it was considered the effect of hole do** on a simple model of the CuO2 planes in La2CuO4, are extended. The parameters are adjusted with the objective to fix the known values of the gap of 2 eV for this material and its dielectric constant of 21. We find again indications of a "hidden" phase transition beneath the superconductor dome. The transition is a se… ▽ More The results of a previous work, where it was considered the effect of hole do** on a simple model of the CuO2 planes in La2CuO4, are extended. The parameters are adjusted with the objective to fix the known values of the gap of 2 eV for this material and its dielectric constant of 21. We find again indications of a "hidden" phase transition beneath the superconductor dome. The transition is a second order one, and is associated with an energetic coincidence of a ground insulator state (AFA)with an excited paramagnetic state showing a pseudogap (PPG), at a critical point of the hole concentration around xc=0.2. The evolution as a function of do** of the band structures and the Fermi surface of the system in the phases AFA and PPG, is shown. In the zone of low do**, the holes begin to occupy the states located at the mid of the sides of the Brillouin zone, that in the AFA states have the strongest antiferromagnetic character. Around the critical do** the results show that in both phases, the Fermi surfaces and the energy spectrum of the filled electronic states tend to coincide. △ Less

Submitted 7 October, 2014; originally announced October 2014.

Comments: 8 pages, in Spanish, 20 figures

arXiv:1404.0150 [pdf, ps, other]

doi 10.1088/0953-8984/27/8/085601

Replica exchange molecular dynamics optimization of tensor network states for quantum many-body systems

Authors: Wenyuan Liu, Chao Wang, Yanbin Li, Yuyang Lao, Yongjian Han, Guang-Can Guo, Lixin He

Abstract: The tensor network states (TNS) methods combined with Monte Carlo (MC) techniques have been proved a powerful algorithm for simulating quantum many-body systems. However, because the ground state energy is a highly non-linear function of the tensors, it is easy to get stuck in local minima when optimizing the TNS of the simulated physical systems. To overcome this difficulty, we introduce a replic… ▽ More The tensor network states (TNS) methods combined with Monte Carlo (MC) techniques have been proved a powerful algorithm for simulating quantum many-body systems. However, because the ground state energy is a highly non-linear function of the tensors, it is easy to get stuck in local minima when optimizing the TNS of the simulated physical systems. To overcome this difficulty, we introduce a replica-exchange molecular dynamics optimization algorithm to obtain the TNS ground state, based on the MC sampling techniques, by map** the energy function of the TNS to that of a classical dynamical system. The method is expected to effectively avoid local minima. We make benchmark tests on a 1D Hubbard model based on matrix product states (MPS) and a Heisenberg $J_1$-$J_2$ model on square lattice based on string bond states (SBS). The results show that the optimization method is robust and efficient compared to the existing results. △ Less

Submitted 1 April, 2014; originally announced April 2014.

arXiv:1311.0676 [pdf, ps, other]

doi 10.1103/PhysRevB.89.094426

Non-reciprocal Oersted field contribution to the current-induced frequency shift of magnetostatic surface waves

Authors: Mohammad Haidar, Matthieu Bailleul, Mikhail Kostylev, Yuyan Lao

Abstract: The influence of an electrical current on the propagation of magnetostatic surface waves is investigated in a relatively thick (40 nm) permalloy film both experimentally and theoretically. Contrary to previously studied thinner films where the dominating effect is the current-induced spin-wave Doppler shift, the magnetic field generated by the current (Oersted field) is found to induce a strong no… ▽ More The influence of an electrical current on the propagation of magnetostatic surface waves is investigated in a relatively thick (40 nm) permalloy film both experimentally and theoretically. Contrary to previously studied thinner films where the dominating effect is the current-induced spin-wave Doppler shift, the magnetic field generated by the current (Oersted field) is found to induce a strong non-reciprocal frequency shift which overcompensates the Doppler shift. The measured current induced frequency shift is in agreement with the developed theory. The theory relates the sign of of the frequency shift to the spin wave modal profiles. The good agreement between the experiment and the theory confirms a recent prediction of a counter-intuitive mode localization for magnetostatic surface waves in the dipole-exchange regime. △ Less

Submitted 4 November, 2013; originally announced November 2013.

Comments: Submitted to Phys. Rev. B

Showing 1–41 of 41 results for author: Lao, Y