Search | arXiv e-print repository

Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation

Authors: Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He

Abstract: The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereb… ▽ More The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereby improving the image quality for diagnostic purposes. In this paper, we propose Diffusion-FFPE, a method for FF-to-FFPE histopathological image translation using a pre-trained diffusion model. Specifically, we employ a one-step diffusion model as the generator and fine-tune it with LoRA adapters using adversarial learning objectives. To ensure that the model effectively captures both global structural information and local details, we propose a multi-scale feature fusion (MFF) module. This module utilizes two VAE encoders to extract features of varying image sizes and performs feature fusion before feeding them into the UNet. Furthermore, we utilize a pre-trained vision-language model for histopathology as the backbone for the discriminator to further improve performance We conducted FF-to-FFPE translation experiments on the TCGA-NSCLC datasets, and our method achieved better performance compared to other methods. The code and models are released at https://github.com/QilaiZhang/Diffusion-FFPE. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17145 [pdf, other]

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.00714 [pdf, other]

A Survey of Deep Learning Based Radar and Vision Fusion for 3D Object Detection in Autonomous Driving

Authors: Di Wu, Feng Yang, Benlian Xu, Pan Liao, Bo Liu

Abstract: With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and faci… ▽ More With the rapid advancement of autonomous driving technology, there is a growing need for enhanced safety and efficiency in the automatic environmental perception of vehicles during their operation. In modern vehicle setups, cameras and mmWave radar (radar), being the most extensively employed sensors, demonstrate complementary characteristics, inherently rendering them conducive to fusion and facilitating the achievement of both robust performance and cost-effectiveness. This paper focuses on a comprehensive survey of radar-vision (RV) fusion based on deep learning methods for 3D object detection in autonomous driving. We offer a comprehensive overview of each RV fusion category, specifically those employing region of interest (ROI) fusion and end-to-end fusion strategies. As the most promising fusion strategy at present, we provide a deeper classification of end-to-end fusion methods, including those 3D bounding box prediction based and BEV based approaches. Moreover, aligning with recent advancements, we delineate the latest information on 4D radar and its cutting-edge applications in autonomous vehicles (AVs). Finally, we present the possible future trends of RV fusion and summarize this paper. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20607 [pdf, other]

Textual Inversion and Self-supervised Refinement for Radiology Report Generation

Authors: Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang

Abstract: Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specific… ▽ More Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on various baselines, which demonstrates the effectiveness and generalization of TISR. The code will be available soon. △ Less

Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: This paper has been early accepted by MICCAI 2024!

arXiv:2405.15176 [pdf, other]

MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method

Authors: Pan Liao, Feng Yang, Di Wu, Liu Bo

Abstract: Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hyb… ▽ More Monocular vision-based 3D object detection is crucial in various sectors, yet existing methods face significant challenges in terms of accuracy and computational efficiency. Building on the successful strategies in 2D detection and depth estimation, we propose MonoDETRNext, which seeks to optimally balance precision and processing speed. Our methodology includes the development of an efficient hybrid visual encoder, enhancement of depth prediction mechanisms, and introduction of an innovative query generation strategy, augmented by an advanced depth predictor. Building on MonoDETR, MonoDETRNext introduces two variants: MonoDETRNext-F, which emphasizes speed, and MonoDETRNext-A, which focuses on precision. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 4.60% improvement in the AP3D metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-F showed a 2.21% increase. Additionally, the computational efficiency of MonoDETRNext-F slightly exceeds that of its predecessor. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.09070 [pdf, other]

Analytical Heterogeneous Die-to-Die 3D Placement with Macros

Authors: Yuxuan Zhao, Peiyu Liao, Siting Liu, Jiaxi Jiang, Yibo Lin, Bei Yu

Abstract: This paper presents an innovative approach to 3D mixed-size placement in heterogeneous face-to-face (F2F) bonded 3D ICs. We propose an analytical framework that utilizes a dedicated density model and a bistratal wirelength model, effectively handling macros and standard cells in a 3D solution space. A novel 3D preconditioner is developed to resolve the topological and physical gap between macros a… ▽ More This paper presents an innovative approach to 3D mixed-size placement in heterogeneous face-to-face (F2F) bonded 3D ICs. We propose an analytical framework that utilizes a dedicated density model and a bistratal wirelength model, effectively handling macros and standard cells in a 3D solution space. A novel 3D preconditioner is developed to resolve the topological and physical gap between macros and standard cells. Additionally, we propose a mixed-integer linear programming (MILP) formulation for macro rotation to optimize wirelength. Our framework is implemented with full-scale GPU acceleration, leveraging an adaptive 3D density accumulation algorithm and an incremental wirelength gradient algorithm. Experimental results on ICCAD 2023 contest benchmarks demonstrate that our framework can achieve 5.9% quality score improvement compared to the first-place winner with 4.0x runtime speedup. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2312.10613 [pdf, other]

p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models

Authors: Haoyuan Wu, Xinyun Zhang, Peng Xu, Peiyu Liao, Xufeng Yao, Bei Yu

Abstract: Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-traine… ▽ More Vision-Language models (VLMs) pre-trained on large corpora have demonstrated notable success across a range of downstream tasks. In light of the rapidly increasing size of pre-trained VLMs, parameter-efficient transfer learning (PETL) has garnered attention as a viable alternative to full fine-tuning. One such approach is the adapter, which introduces a few trainable parameters into the pre-trained models while preserving the original parameters during adaptation. In this paper, we present a novel modeling framework that recasts adapter tuning after attention as a graph message passing process on attention graphs, where the projected query and value features and attention matrix constitute the node features and the graph adjacency matrix, respectively. Within this framework, tuning adapters in VLMs necessitates handling heterophilic graphs, owing to the disparity between the projected query and value space. To address this challenge, we propose a new adapter architecture, $p$-adapter, which employs $p$-Laplacian message passing in Graph Neural Networks (GNNs). Specifically, the attention weights are re-normalized based on the features, and the features are then aggregated using the calibrated attention matrix, enabling the dynamic exploitation of information with varying frequencies in the heterophilic attention graphs. We conduct extensive experiments on different pre-trained VLMs and multi-modal tasks, including visual question answering, visual entailment, and image captioning. The experimental results validate our method's significant superiority over other PETL methods. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI24. The first two authors contributed equally to this paper

arXiv:2310.07424 [pdf, other]

Analytical Die-to-Die 3D Placement with Bistratal Wirelength Model and GPU Acceleration

Authors: Peiyu Liao, Yuxuan Zhao, Dawei Guo, Yibo Lin, Bei Yu

Abstract: In this paper, we present a new analytical 3D placement framework with a bistratal wirelength model for F2F-bonded 3D ICs with heterogeneous technology nodes based on the electrostatic-based density model. The proposed framework, enabled GPU-acceleration, is capable of efficiently determining node partitioning and locations simultaneously, leveraging the dedicated 3D wirelength model and density m… ▽ More In this paper, we present a new analytical 3D placement framework with a bistratal wirelength model for F2F-bonded 3D ICs with heterogeneous technology nodes based on the electrostatic-based density model. The proposed framework, enabled GPU-acceleration, is capable of efficiently determining node partitioning and locations simultaneously, leveraging the dedicated 3D wirelength model and density model. The experimental results on ICCAD 2022 contest benchmarks demonstrate that our proposed 3D placement framework can achieve up to 6.1% wirelength improvement and 4.1% on average compared to the first-place winner with much fewer vertical interconnections and up to 9.8x runtime speedup. Notably, the proposed framework also outperforms the state-of-the-art 3D analytical placer by up to 3.3% wirelength improvement and 2.1% on average with up to 8.8x acceleration on large cases using GPUs. △ Less

Submitted 11 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2308.14763 [pdf, other]

VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired

Authors: Jia-Jyu Su, Pang-Chen Liao, Yen-Ting Lin, Wu-Hao Li, Guan-Ting Liou, Cheng-Che Kao, Wei-Cheng Chen, Jen-Chieh Chiang, Wen-Yang Chang, Pin-Han Lin, Chen-Yu Chiang

Abstract: Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of… ▽ More Services of personalized TTS systems for the Mandarin-speaking speech impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020, aiming to build a complete set of services to deliver personalized Mandarin TTS systems to amyotrophic lateral sclerosis patients. This paper reports the corpus design, corpus recording, data purging and correction for the corpus, and evaluations of the developed personalized TTS systems, for the VoiceBanking project. The developed corpus is named after the VoiceBank-2023 speech corpus because of its release year. The corpus contains 29.78 hours of utterances with prompts of short paragraphs and common phrases spoken by 111 native Mandarin speakers. The corpus is labeled with information about gender, degree of speech impairment, types of users, transcription, SNRs, and speaking rates. The VoiceBank-2023 is available by request for non-commercial use and welcomes all parties to join the VoiceBanking project to improve the services for the speech impaired. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: submitted to 26th International Conference of the ORIENTAL-COCOSDA

arXiv:2308.06791 [pdf, other]

PV-SSD: A Multi-Modal Point Cloud Feature Fusion Method for Projection Features and Variable Receptive Field Voxel Features

Authors: Yongxin Shao, Aihong Tan, Zhetao Sun, Enhui Zheng, Tianhong Yan, Peng Liao

Abstract: LiDAR-based 3D object detection and classification is crucial for autonomous driving. However, real-time inference from extremely sparse 3D data is a formidable challenge. To address this problem, a typical class of approaches transforms the point cloud cast into a regular data representation (voxels or projection maps). Then, it performs feature extraction with convolutional neural networks. Howe… ▽ More LiDAR-based 3D object detection and classification is crucial for autonomous driving. However, real-time inference from extremely sparse 3D data is a formidable challenge. To address this problem, a typical class of approaches transforms the point cloud cast into a regular data representation (voxels or projection maps). Then, it performs feature extraction with convolutional neural networks. However, such methods often result in a certain degree of information loss due to down-sampling or over-compression of feature information. This paper proposes a multi-modal point cloud feature fusion method for projection features and variable receptive field voxel features (PV-SSD) based on projection and variable voxelization to solve the information loss problem. We design a two-branch feature extraction structure with a 2D convolutional neural network to extract the point cloud's projection features in bird's-eye view to focus on the correlation between local features. A voxel feature extraction branch is used to extract local fine-grained features. Meanwhile, we propose a voxel feature extraction method with variable sensory fields to reduce the information loss of voxel branches due to downsampling. It avoids missing critical point information by selecting more useful feature points based on feature point weights for the detection task. In addition, we propose a multi-modal feature fusion module for point clouds. To validate the effectiveness of our method, we tested it on the KITTI dataset and ONCE dataset. △ Less

Submitted 13 April, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

arXiv:2308.03016 [pdf, other]

Sha** a Smarter Electromagnetic Landscape: IAB, NCR, and RIS in 5G Standard and Future 6G

Authors: Chao-Kai Wen, Lung-Sheng Tsai, Arman Shojaeifard, Pei-Kai Liao, Kai-Kit Wong, Chan-Byoung Chae

Abstract: The main objective of 5G and beyond networks is to provide an optimal user experience in terms of throughput and reliability, irrespective of location and time. To achieve this, traditional fixed macro base station deployments are being replaced by more innovative and flexible solutions, such as wireless backhaul and relays. This article focuses on the evolution and standardization of these advanc… ▽ More The main objective of 5G and beyond networks is to provide an optimal user experience in terms of throughput and reliability, irrespective of location and time. To achieve this, traditional fixed macro base station deployments are being replaced by more innovative and flexible solutions, such as wireless backhaul and relays. This article focuses on the evolution and standardization of these advancements, which are sha** the electromagnetic landscape. Specifically, we explore Integrated Access and Backhaul (IAB) nodes, which offer a cost-efficient and agile alternative to fiber backhaul. We also discuss Network-Controlled Repeaters (NCRs) and the emergence of Reconfigurable Intelligent Surfaces (RIS) actively adapting the wireless environment. The article provides an overview of the 5G features and ongoing developments in 3GPP Release 18 related to these intelligent EM entities, highlighting the expected evolution of future wireless networks in terms of architecture, operations, and control signals. △ Less

Submitted 18 January, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, 1 table. This work has been accepted to publish in IEEE Communications Standards Magazine

arXiv:2305.12308 [pdf, other]

MIMO Evolution toward 6G: End-User-Centric Collaborative MIMO

Authors: Lung-Sheng Tsai, Shang-Ling Shih, Pei-Kai Liao, Chao-Kai Wen

Abstract: In 6G, the trend of transitioning from massive antenna elements to even more massive ones is continued. However, installing additional antennas in the limited space of user equipment (UE) is challenging, resulting in limited capacity scaling gain for end users, despite network side support for increasing numbers of antennas. To address this issue, we propose an end-user-centric collaborative MIMO… ▽ More In 6G, the trend of transitioning from massive antenna elements to even more massive ones is continued. However, installing additional antennas in the limited space of user equipment (UE) is challenging, resulting in limited capacity scaling gain for end users, despite network side support for increasing numbers of antennas. To address this issue, we propose an end-user-centric collaborative MIMO (UE-CoMIMO) framework that groups several fixed or portable devices to provide a virtual abundance of antennas. This article outlines how advanced L1 relays and conventional relays enable device collaboration to offer diversity, rank, and localization enhancements. We demonstrate through system-level simulations how the UE-CoMIMO approaches lead to significant performance gains. Lastly, we discuss necessary research efforts to make UE-CoMIMO available for 6G and future research directions. △ Less

Submitted 14 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: 7 pages, 5 figures, 1 table. This work has been accepted in IEEE Communications Magazine

arXiv:2305.11407 [pdf, other]

LATTE: Label-efficient Incident Phenoty** from Longitudinal Electronic Health Records

Authors: Jun Wen, Jue Hou, Clara-Lea Bonzel, Yihan Zhao, Victor M. Castro, Vivian S. Gainer, Dana Weisenfeld, Tianrun Cai, Yuk-Lam Ho, Vidul A. Panickan, Lauren Costa, Chuan Hong, J. Michael Gaziano, Katherine P. Liao, Junwei Lu, Kelly Cho, Tianxi Cai

Abstract: Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical eve… ▽ More Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embedding vectors from large-scale EHR data as prior knowledge, LATTE selects predictive EHR features in a concept re-weighting module by mining their relationship to the target event and compresses their information into longitudinal visit embeddings through a visit attention learning network. LATTE employs a recurrent neural network to capture the sequential dependency between the target event and visit embeddings before/after it. To improve label efficiency, LATTE constructs highly informative longitudinal silver-standard labels from large-scale unlabeled patients to perform unsupervised pre-training and semi-supervised joint training. Finally, LATTE enhances cross-site portability via contrastive representation learning. LATTE is evaluated on three analyses: the onset of type-2 diabetes, heart failure, and the onset and relapses of multiple sclerosis. We use various evaluation metrics present in the literature including the $ABC_{gain}$, the proportion of reduction in the area between the observed event indicator and the predicted cumulative incidences in reference to the prediction per incident prevalence. LATTE consistently achieves substantial improvement over benchmark methods such as SAMGEP and RETAIN in all settings. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: ERHs data

arXiv:2304.05365 [pdf, other]

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

Authors: Susobhan Ghosh, Raphael Kim, Prasidh Chhabria, Raaz Dwivedi, Predrag Klasnja, Peng Liao, Kelly Zhang, Susan Murphy

Abstract: There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this pro… ▽ More There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study. △ Less

Submitted 7 August, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: The first two authors contributed equally

arXiv:2211.12420 [pdf]

doi 10.1111/mice.13078

Brain informed transfer learning for categorizing construction hazards

Authors: Xiaoshan Zhou, Pin-Chao Liao

Abstract: A transfer learning paradigm is proposed for "knowledge" transfer between the human brain and convolutional neural network (CNN) for a construction hazard categorization task. Participants' brain activities are recorded using electroencephalogram (EEG) measurements when viewing the same images (target dataset) as the CNN. The CNN is pretrained on the EEG data and then fine-tuned on the constructio… ▽ More A transfer learning paradigm is proposed for "knowledge" transfer between the human brain and convolutional neural network (CNN) for a construction hazard categorization task. Participants' brain activities are recorded using electroencephalogram (EEG) measurements when viewing the same images (target dataset) as the CNN. The CNN is pretrained on the EEG data and then fine-tuned on the construction scene images. The results reveal that the EEG-pretrained CNN achieves a 9 % higher accuracy compared with a network with same architecture but randomly initialized parameters on a three-class classification task. Brain activity from the left frontal cortex exhibits the highest performance gains, thus indicating high-level cognitive processing during hazard recognition. This work is a step toward improving machine learning algorithms by learning from human-brain signals recorded via a commercially available brain-computer interface. More generalized visual recognition systems can be effectively developed based on this approach of "keep human in the loop". △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.10713 [pdf]

A privacy-preserving data storage and service framework based on deep learning and blockchain for construction workers' wearable IoT sensors

Authors: Xiaoshan Zhou, Pin-Chao Liao

Abstract: Classifying brain signals collected by wearable Internet of Things (IoT) sensors, especially brain-computer interfaces (BCIs), is one of the fastest-growing areas of research. However, research has mostly ignored the secure storage and privacy protection issues of collected personal neurophysiological data. Therefore, in this article, we try to bridge this gap and propose a secure privacy-preservi… ▽ More Classifying brain signals collected by wearable Internet of Things (IoT) sensors, especially brain-computer interfaces (BCIs), is one of the fastest-growing areas of research. However, research has mostly ignored the secure storage and privacy protection issues of collected personal neurophysiological data. Therefore, in this article, we try to bridge this gap and propose a secure privacy-preserving protocol for implementing BCI applications. We first transformed brain signals into images and used generative adversarial network to generate synthetic signals to protect data privacy. Subsequently, we applied the paradigm of transfer learning for signal classification. The proposed method was evaluated by a case study and results indicate that real electroencephalogram data augmented with artificially generated samples provide superior classification performance. In addition, we proposed a blockchain-based scheme and developed a prototype on Ethereum, which aims to make storing, querying and sharing personal neurophysiological data and analysis reports secure and privacy-aware. The rights of three main transaction bodies - construction workers, BCI service providers and project managers - are described and the advantages of the proposed system are discussed. We believe this paper provides a well-rounded solution to safeguard private data against cyber-attacks, level the playing field for BCI application developers, and to the end improve professional well-being in the industry. △ Less

Submitted 19 November, 2022; originally announced November 2022.

arXiv:2211.06132 [pdf]

doi 10.1061/JCEMD4.COENG-1335

Weighing votes in human-machine collaboration for hazard recognition: Inferring hazard perceptual threshold and decision confidence from electroencephalogram wavelets

Authors: Xiaoshan Zhou, Pin-Chao Liao

Abstract: Purpose: Human-machine collaboration is a promising strategy to improve hazard inspection. However, research on the effective integration of opinions from humans with machines for optimal group decision making is lacking. Hence, considering the benefits of a brain-computer interface (BCI) to enable intuitive commutation, this study proposes a novel method to predict human hazard response choices a… ▽ More Purpose: Human-machine collaboration is a promising strategy to improve hazard inspection. However, research on the effective integration of opinions from humans with machines for optimal group decision making is lacking. Hence, considering the benefits of a brain-computer interface (BCI) to enable intuitive commutation, this study proposes a novel method to predict human hazard response choices and decision confidence from brain activities for a superior confidence-weighted voting strategy. Methodology: First, we developed a Bayesian inference-based algorithm to ascertain the decision threshold above which a hazard is reported from human brain signals. This method was tested empirically with electroencephalogram (EEG) data collected in a laboratory setting and cross-validated using behavioral indices of the signal detection theory. Subsequently, based on numerical simulations, the decision criteria for low-, medium-, and high-confidence level differentiations characterized by parietal alpha-band EEG power were determined. Findings : The investigated hazard recognition task was described as a process of probabilistic inference involving a decision uncertainty evaluation. The results demonstrated the feasibility of EEG measurements in observing human internal representations of hazard discrimination. Moreover, the optimal criteria to differentiate between low-, medium-, and high-confidence levels were obtained by benchmarking against an optimal Bayesian observer. Originality: This research demonstrates the potential of a BCI as an effective channel for telecommunication, laying the foundation for the design of future hazard detection techniques in the collaborative human-machine systems research field. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2211.05931 [pdf]

doi 10.3390/su15064812

EEG-based performance-driven adaptive automated hazard alerting system in security surveillance support

Authors: Xiaoshan Zhou, Pin-Chao Liao

Abstract: Computer-vision technologies have emerged to assist security surveillance. However, automation alert/alarm systems often apply a low-beta threshold to avoid misses and generates excessive false alarms. This study proposed an adaptive hazard diagnosis and alarm system with adjustable alert threshold levels based on environmental scenarios and operator's hazard recognition performance. We recorded e… ▽ More Computer-vision technologies have emerged to assist security surveillance. However, automation alert/alarm systems often apply a low-beta threshold to avoid misses and generates excessive false alarms. This study proposed an adaptive hazard diagnosis and alarm system with adjustable alert threshold levels based on environmental scenarios and operator's hazard recognition performance. We recorded electroencephalogram (EEG) data during hazard recognition tasks. The linear ballistic accumulator model was used to decompose the response time into several psychological subcomponents, which were further estimated by a Markov chain Monte Carlo algorithm and compared among different types of hazardous scenarios. Participants were most cautious about falling hazards, followed by electricity hazards, and had the least conservative attitude toward structural hazards. Participants were classified into three performance-level subgroups using a latent profile analysis based on task accuracy. We applied the transfer learning paradigm to classify subgroups based on their time-frequency representations of EEG data. Additionally, two continual learning strategies were investigated to ensure a robust adaptation of the model to predict participants' performance levels in different hazardous scenarios. These findings can be leveraged in real-world brain-computer interface applications, which will provide human trust in automation and promote the successful implementation of alarm technologies. △ Less

Submitted 9 November, 2022; originally announced November 2022.

arXiv:2210.12953 [pdf, other]

Implementation of Trained Factorization Machine Recommendation System on Quantum Annealer

Authors: Chen-Yu Liu, Hsin-Yu Wang, Pei-Yen Liao, Ching-Jui Lai, Min-Hsiu Hsieh

Abstract: Factorization Machine (FM) is the most commonly used model to build a recommendation system since it can incorporate side information to improve performance. However, producing item suggestions for a given user with a trained FM is time-consuming. It requires a run-time of $O((N_m \log N_m)^2)$, where $N_m$ is the number of items in the dataset. To address this problem, we propose a quadratic unco… ▽ More Factorization Machine (FM) is the most commonly used model to build a recommendation system since it can incorporate side information to improve performance. However, producing item suggestions for a given user with a trained FM is time-consuming. It requires a run-time of $O((N_m \log N_m)^2)$, where $N_m$ is the number of items in the dataset. To address this problem, we propose a quadratic unconstrained binary optimization (QUBO) scheme to combine with FM and apply quantum annealing (QA) computation. Compared to classical methods, this hybrid algorithm provides a faster than quadratic speedup in finding good user suggestions. We then demonstrate the aforementioned computational advantage on current NISQ hardware by experimenting with a real example on a D-Wave annealer. △ Less

Submitted 8 November, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: 9 pages, 3 figures

arXiv:2206.11404 [pdf, other]

The ArtBench Dataset: Benchmarking Generative Models with Artworks

Authors: Peiyuan Liao, Xiuyu Li, Xihui Liu, Kurt Keutzer

Abstract: We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previou… ▽ More We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation. It comprises 60,000 images of artwork from 10 distinctive artistic styles, with 5,000 training images and 1,000 testing images per style. ArtBench-10 has several advantages over previous artwork datasets. Firstly, it is class-balanced while most previous artwork datasets suffer from the long tail class distributions. Secondly, the images are of high quality with clean annotations. Thirdly, ArtBench-10 is created with standardized data collection, annotation, filtering, and preprocessing procedures. We provide three versions of the dataset with different resolutions ($32\times32$, $256\times256$, and original image size), formatted in a way that is easy to be incorporated by popular machine learning frameworks. We also conduct extensive benchmarking experiments using representative image synthesis models with ArtBench-10 and present in-depth analysis. The dataset is available at https://github.com/liaopeiyuan/artbench under a Fair Use license. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: The first two authors contributed equally to this work. The code and data are available at https://github.com/liaopeiyuan/artbench

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2202.06670 [pdf, other]

Learning Weakly-Supervised Contrastive Representations

Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

Abstract: We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stag… ▽ More We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stage is to cluster data according to its auxiliary information. The second stage is to learn similar representations within the same cluster and dissimilar representations for data from different clusters. Our empirical experiments suggest the following three contributions. First, compared to conventional self-supervised representations, the auxiliary-information-infused representations bring the performance closer to the supervised representations, which use direct downstream labels as supervision signals. Second, our approach performs the best in most cases, when comparing our approach with other baseline representation learning methods that also leverage auxiliary data information. Third, we show that our approach also works well with unsupervised constructed clusters (e.g., no auxiliary information), resulting in a strong unsupervised representation learning approach. △ Less

Submitted 18 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: Published as ICLR 2022. arXiv admin note: substantial text overlap with arXiv:2106.02869

arXiv:2111.00655 [pdf, other]

doi 10.1145/3559009.3569651

Collage: Seamless Integration of Deep Learning Backends with Automatic Placement

Authors: Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Chen, Zhihao Jia

Abstract: The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. Howe… ▽ More The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. However, current DL frameworks require significant manual effort and expertise to integrate every new backend while failing to unleash its full potential. Given the fast-evolving nature of the DL ecosystem, this manual approach often slows down continuous innovations across different layers; it prevents hardware vendors from the fast deployment of their cutting-edge libraries, DL framework developers must repeatedly adjust their hand-coded rules to accommodate new versions of libraries, and machine learning practitioners need to wait for the integration of new technologies and often encounter unsatisfactory performance. In this paper, we propose Collage, a DL framework that offers seamless integration of DL backends. Collage provides an expressive backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage automatically searches for an optimized backend placement strategy for a given workload and execution environment. Our evaluation shows that Collage outperforms the best existing framework for each hardware by $1.26\times$, $1.43\times$, $1.40\times$ on average on NVIDIA's RTX 2070 GPU, V100 GPU, and Intel's Xeon 8259CL CPU, respectively. Collage has been open-sourced and deployed in Apache TVM. △ Less

Submitted 27 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

Comments: Published in PACT 22

arXiv:2106.02869 [pdf, other]

Integrating Auxiliary Information in Self-supervised Learning

Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

Abstract: This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the struct… ▽ More This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the structural information from the auxiliary information, we present to construct data clusters according to the auxiliary information. Then, we introduce the Clustering InfoNCE (Cl-InfoNCE) objective that learns similar representations for augmented variants of data from the same cluster and dissimilar representations for data from different clusters. Our approach contributes as follows: 1) Comparing to conventional self-supervised representations, the auxiliary-information-infused self-supervised representations bring the performance closer to the supervised representations; 2) The presented Cl-InfoNCE can also work with unsupervised constructed clusters (e.g., k-means clusters) and outperform strong clustering-based self-supervised learning approaches, such as the Prototypical Contrastive Learning (PCL) method; 3) We show that Cl-InfoNCE may be a better approach to leverage the data clustering information, by comparing it to the baseline approach - learning to predict the clustering assignments with cross-entropy loss. For analysis, we connect the goodness of the learned representations with the statistical relationships: i) the mutual information between the labels and the clusters and ii) the conditional entropy of the clusters given the labels. △ Less

Submitted 5 June, 2021; originally announced June 2021.

arXiv:2011.04185 [pdf, other]

Robust Batch Policy Learning in Markov Decision Processes

Authors: Zhengling Qi, Peng Liao

Abstract: We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution. Given a pre-collected dataset of multiple traject… ▽ More We study the offline data-driven sequential decision making problem in the framework of Markov decision process (MDP). In order to enhance the generalizability and adaptivity of the learned policy, we propose to evaluate each policy by a set of the average rewards with respect to distributions centered at the policy induced stationary distribution. Given a pre-collected dataset of multiple trajectories generated by some behavior policy, our goal is to learn a robust policy in a pre-specified policy class that can maximize the smallest value of this set. Leveraging the theory of semi-parametric statistics, we develop a statistically efficient policy learning method for estimating the de ned robust optimal policy. A rate-optimal regret bound up to a logarithmic factor is established in terms of total decision points in the dataset. △ Less

Submitted 9 November, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

arXiv:2010.05350 [pdf, other]

Google Landmark Recognition 2020 Competition Third Place Solution

Authors: Qishen Ha, Bo Liu, Fuxu Liu, Peiyuan Liao

Abstract: We present our third place solution to the Google Landmark Recognition 2020 competition. It is an ensemble of global features only Sub-center ArcFace models. We introduce dynamic margins for ArcFace loss, a family of tune-able margin functions of class size, designed to deal with the extreme imbalance in GLDv2 dataset. Progressive finetuning and careful postprocessing are also key to the solution.… ▽ More We present our third place solution to the Google Landmark Recognition 2020 competition. It is an ensemble of global features only Sub-center ArcFace models. We introduce dynamic margins for ArcFace loss, a family of tune-able margin functions of class size, designed to deal with the extreme imbalance in GLDv2 dataset. Progressive finetuning and careful postprocessing are also key to the solution. Our two submissions scored 0.6344 and 0.6289 on private leaderboard, both ranking third place out of 736 teams. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: 5 pages, 2 figures

arXiv:2009.13504 [pdf, other]

Information Obfuscation of Graph Neural Networks

Authors: Peiyuan Liao, Han Zhao, Keyulu Xu, Tommi Jaakkola, Geoffrey Gordon, Stefanie Jegelka, Ruslan Salakhutdinov

Abstract: While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning w… ▽ More While the advent of Graph Neural Networks (GNNs) has greatly improved node and graph representation learning in many applications, the neighborhood aggregation scheme exposes additional vulnerabilities to adversaries seeking to extract node-level information about sensitive attributes. In this paper, we study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance. Our method creates a strong defense against inference attacks, while only suffering small loss in task performance. Theoretically, we analyze the effectiveness of our framework against a worst-case adversary, and characterize an inherent trade-off between maximizing predictive accuracy and minimizing information leakage. Experiments across multiple datasets from recommender systems, knowledge graphs and quantum chemistry demonstrate that the proposed approach provides a robust defense across various graph structures and tasks, while producing competitive GNN encoders for downstream tasks. △ Less

Submitted 13 June, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

Comments: ICML 2021; Code is available at https://github.com/liaopeiyuan/GAL

arXiv:2008.01571 [pdf, other]

IntelligentPooling: Practical Thompson Sampling for mHealth

Authors: Sabina Tomkins, Peng Liao, Predrag Klasnja, Susan Murphy

Abstract: In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of hel** the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobi… ▽ More In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of hel** the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user's degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user's time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial. △ Less

Submitted 12 December, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: arXiv admin note: text overlap with arXiv:2002.09971

arXiv:2002.09971 [pdf, other]

Rapidly Personalizing Mobile Health Treatment Policies with Limited Data

Authors: Sabina Tomkins, Peng Liao, Predrag Klasnja, Serena Yeung, Susan Murphy

Abstract: In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challengi… ▽ More In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users. △ Less

Submitted 23 February, 2020; originally announced February 2020.

arXiv:1912.13088 [pdf, ps, other]

Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile Health

Authors: Peng Liao, Predrag Klasnja, Susan Murphy

Abstract: Due to the recent advancements in wearables and sensing technology, health scientists are increasingly develo** mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHea… ▽ More Due to the recent advancements in wearables and sensing technology, health scientists are increasingly develo** mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map an individual's current state (e.g., individual's past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention. △ Less

Submitted 22 July, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

arXiv:1912.11046 [pdf, other]

Improving Abstractive Text Summarization with History Aggregation

Authors: Pengcheng Liao, Chuang Zhang, Xiaojun Chen, Xiaofei Zhou

Abstract: Recent neural sequence to sequence models have provided feasible solutions for abstractive summarization. However, such models are still hard to tackle long text dependency in the summarization task. A high-quality summarization system usually depends on strong encoder which can refine important information from long input texts so that the decoder can generate salient summaries from the encoder's… ▽ More Recent neural sequence to sequence models have provided feasible solutions for abstractive summarization. However, such models are still hard to tackle long text dependency in the summarization task. A high-quality summarization system usually depends on strong encoder which can refine important information from long input texts so that the decoder can generate salient summaries from the encoder's memory. In this paper, we propose an aggregation mechanism based on the Transformer model to address the challenge of long text representation. Our model can review history information to make encoder hold more memory capacity. Empirically, we apply our aggregation mechanism to the Transformer model and experiment on CNN/DailyMail dataset to achieve higher quality summaries compared to several strong baseline models on the ROUGE metrics. △ Less

Submitted 23 December, 2019; originally announced December 2019.

arXiv:1909.03539 [pdf, other]

Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity

Authors: Peng Liao, Kristjan Greenewald, Predrag Klasnja, Susan Murphy

Abstract: With the recent evolution of mobile health technologies, health scientists are increasingly interested in develo** just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., trea… ▽ More With the recent evolution of mobile health technologies, health scientists are increasingly interested in develo** just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., treatment policy) that takes the user's current context as input and specifies whether and what type of an intervention should be provided at the moment. In this paper, we develop a Reinforcement Learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as the data is being collected from the user. This work is motivated by our collaboration on designing the RL algorithm in HeartSteps V2 based on data from HeartSteps V1. HeartSteps is a physical activity mobile health application. The RL algorithm developed in this paper is being used in HeartSteps V2 to decide, five times per day, whether to deliver a context-tailored activity suggestion. △ Less

Submitted 8 September, 2019; originally announced September 2019.

arXiv:1901.07196 [pdf, other]

CAE-ADMM: Implicit Bitrate Optimization via ADMM-based Pruning in Compressive Autoencoders

Authors: Haimeng Zhao, Peiyuan Liao

Abstract: We introduce ADMM-pruned Compressive AutoEncoder (CAE-ADMM) that uses Alternative Direction Method of Multipliers (ADMM) to optimize the trade-off between distortion and efficiency of lossy image compression. Specifically, ADMM in our method is to promote sparsity to implicitly optimize the bitrate, different from entropy estimators used in the previous research. The experiments on public datasets… ▽ More We introduce ADMM-pruned Compressive AutoEncoder (CAE-ADMM) that uses Alternative Direction Method of Multipliers (ADMM) to optimize the trade-off between distortion and efficiency of lossy image compression. Specifically, ADMM in our method is to promote sparsity to implicitly optimize the bitrate, different from entropy estimators used in the previous research. The experiments on public datasets show that our method outperforms the original CAE and some traditional codecs in terms of SSIM/MS-SSIM metrics, at reasonable inference speed. △ Less

Submitted 8 February, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

Comments: 5 pages, 4 figures, 2 tables, 1 algorithm

arXiv:1901.05560 [pdf, other]

doi 10.1145/3289600.3291033

Off-Policy Evaluation of Probabilistic Identity Data in Lookalike Modeling

Authors: Randell Cotta, Mingyang Hu, Dan Jiang, Peizhou Liao

Abstract: We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying use… ▽ More We evaluate the impact of probabilistically-constructed digital identity data collected from Sep. to Dec. 2017 (approx.), in the context of Lookalike-targeted campaigns. The backbone of this study is a large set of probabilistically-constructed "identities", represented as small bags of cookies and mobile ad identifiers with associated metadata, that are likely all owned by the same underlying user. The identity data allows to generate "identity-based", rather than "identifier-based", user models, giving a fuller picture of the interests of the users underlying the identifiers. We employ off-policy techniques to evaluate the potential of identity-powered lookalike models without incurring the risk of allowing untested models to direct large amounts of ad spend or the large cost of performing A/B tests. We add to historical work on off-policy evaluation by noting a significant type of "finite-sample bias" that occurs for studies combining modestly-sized datasets and evaluation metrics involving rare events (e.g., conversions). We illustrate this bias using a simulation study that later informs the handling of inverse propensity weights in our analyses on real data. We demonstrate significant lift in identity-powered lookalikes versus an identity-ignorant baseline: on average ~70% lift in conversion rate. This rises to factors of ~(4-32)x for identifiers having little data themselves, but that can be inferred to belong to users with substantial data to aggregate across identifiers. This implies that identity-powered user modeling is especially important in the context of identifiers having very short lifespans (i.e., frequently churned cookies). Our work motivates and informs the use of probabilistically-constructed identities in marketing. It also deepens the canon of examples in which off-policy learning has been employed to evaluate the complex systems of the internet economy. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: Accepted by WSDM 2019

arXiv:1810.13059 [pdf]

doi 10.1109/LSP.2019.2922851

Visual Attention Network for Low Dose CT

Authors: Wenchao Du, Hu Chen, Peixi Liao, Hongyu Yang, Ge Wang, Yi Zhang

Abstract: Noise and artifacts are intrinsic to low dose CT (LDCT) data acquisition, and will significantly affect the imaging performance. Perfect noise removal and image restoration is intractable in the context of LDCT due to the statistical and technical uncertainties. In this paper, we apply the generative adversarial network (GAN) framework with a visual attention mechanism to deal with this problem in… ▽ More Noise and artifacts are intrinsic to low dose CT (LDCT) data acquisition, and will significantly affect the imaging performance. Perfect noise removal and image restoration is intractable in the context of LDCT due to the statistical and technical uncertainties. In this paper, we apply the generative adversarial network (GAN) framework with a visual attention mechanism to deal with this problem in a data-driven/machine learning fashion. Our main idea is to inject visual attention knowledge into the learning process of GAN to provide a powerful prior of the noise distribution. By doing this, both the generator and discriminator networks are empowered with visual attention information so they will not only pay special attention to noisy regions and surrounding structures but also explicitly assess the local consistency of the recovered regions. Our experiments qualitatively and quantitatively demonstrate the effectiveness of the proposed method with clinic CT images. △ Less

Submitted 13 June, 2019; v1 submitted 30 October, 2018; originally announced October 2018.

Comments: 5 pages, 6 figures. To appear on IEEE Signal Processing Letters

arXiv:1809.05020 [pdf, other]

Full Workspace Generation of Serial-link Manipulators by Deep Learning based Jacobian Estimation

Authors: Peiyuan Liao, Jiajun Mao

Abstract: Apart from solving complicated problems that require a certain level of intelligence, fine-tuned deep neural networks can also create fast algorithms for slow, numerical tasks. In this paper, we introduce an improved version of [1]'s work, a fast, deep-learning framework capable of generating the full workspace of serial-link manipulators. The architecture consists of two neural networks: an estim… ▽ More Apart from solving complicated problems that require a certain level of intelligence, fine-tuned deep neural networks can also create fast algorithms for slow, numerical tasks. In this paper, we introduce an improved version of [1]'s work, a fast, deep-learning framework capable of generating the full workspace of serial-link manipulators. The architecture consists of two neural networks: an estimation net that approximates the manipulator Jacobian, and a confidence net that measures the confidence of the approximation. We also introduce M3 (Manipulability Maps of Manipulators), a MATLAB robotics library based on [2](RTB), the datasets generated by which are used by this work. Results have shown that not only are the neural networks significantly faster than numerical inverse kinematics, it also offers superior accuracy when compared to other machine learning alternatives. Implementations of the algorithm (based on Keras[3]), including benchmark evaluation script, are available at https://github.com/liaopeiyuan/Jacobian-Estimation . The M3 Library APIs and datasets are also available at https://github.com/liaopeiyuan/M3 . △ Less

Submitted 13 September, 2018; v1 submitted 31 August, 2018; originally announced September 2018.

Comments: 10 pages, 12 figures

arXiv:1804.08951 [pdf, other]

doi 10.1109/ICCAIRO.2018.00027

Deep Neural Network Based Subspace Learning of Robotic Manipulator Workspace Map**

Authors: Peiyuan Liao

Abstract: The manipulator workspace map** is an important problem in robotics and has attracted significant attention in the community. However, most of the pre-existing algorithms have expensive time complexity due to the reliance on sophisticated kinematic equations. To solve this problem, this paper introduces subspace learning (SL), a variant of subspace embedding, where a set of robot and scope param… ▽ More The manipulator workspace map** is an important problem in robotics and has attracted significant attention in the community. However, most of the pre-existing algorithms have expensive time complexity due to the reliance on sophisticated kinematic equations. To solve this problem, this paper introduces subspace learning (SL), a variant of subspace embedding, where a set of robot and scope parameters is mapped to the corresponding workspace by a deep neural network (DNN). Trained on a large dataset of around $\mathbf{6\times 10^4}$ samples obtained from a MATLAB$^\circledR$ implementation of a classical method and sampling of designed uniform distributions, the experiments demonstrate that the embedding significantly reduces run-time from $\mathbf{5.23 \times 10^3}$ s of traditional discretization method to $\mathbf{0.224}$ s, with high accuracies (average F-measure is $\mathbf{0.9665}$ with batch gradient descent and resilient backpropagation). △ Less

Submitted 24 April, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

Comments: 12 pages, 12 figures, accepted for presentation at ICCAIRO 2018

arXiv:1801.06044 [pdf]

doi 10.1016/j.procs.2017.11.383

Analysis of the Relation between Artificial Intelligence and the Internet from the Perspective of Brain Science

Authors: Feng Liu, Yong Shi, Peijia Lia

Abstract: Artificial intelligence (AI) like deep learning, cloud AI computation has been advancing at a rapid pace since 2014. There is no doubt that the prosperity of AI is inseparable with the development of the Internet. However, there has been little attention to the link between AI and the internet. This paper explores them with brain insights mainly from four views:1) How is the general relation betwe… ▽ More Artificial intelligence (AI) like deep learning, cloud AI computation has been advancing at a rapid pace since 2014. There is no doubt that the prosperity of AI is inseparable with the development of the Internet. However, there has been little attention to the link between AI and the internet. This paper explores them with brain insights mainly from four views:1) How is the general relation between artificial intelligence and Internet of Things, cloud computing, big data and Industrial Internet from the perspective of brain science. 2) Construction of a new AI system model with the Internet and brain science. △ Less

Submitted 2 January, 2018; originally announced January 2018.

Comments: 6 pages,3 figures

arXiv:1708.04001 [pdf, other]

Group-driven Reinforcement Learning for Personalized mHealth Intervention

Authors: Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, Junzhou Huang

Abstract: Due to the popularity of smartphones and wearable devices nowadays, mobile health (mHealth) technologies are promising to bring positive and wide impacts on people's health. State-of-the-art decision-making methods for mHealth rely on some ideal assumptions. Those methods either assume that the users are completely homogenous or completely heterogeneous. However, in reality, a user might be simila… ▽ More Due to the popularity of smartphones and wearable devices nowadays, mobile health (mHealth) technologies are promising to bring positive and wide impacts on people's health. State-of-the-art decision-making methods for mHealth rely on some ideal assumptions. Those methods either assume that the users are completely homogenous or completely heterogeneous. However, in reality, a user might be similar with some, but not all, users. In this paper, we propose a novel group-driven reinforcement learning method for the mHealth. We aim to understand how to share information among similar users to better convert the limited user information into sharper learned RL policies. Specifically, we employ the K-means clustering method to group users based on their trajectory information similarity and learn a shared RL policy for each group. Extensive experiment results have shown that our method can achieve clear gains over the state-of-the-art RL methods for mHealth. △ Less

Submitted 13 August, 2017; originally announced August 2017.

arXiv:1707.09636 [pdf]

LEARN: Learned Experts' Assessment-based Reconstruction Network for Sparse-data CT

Authors: Hu Chen, Yi Zhang, Yun** Chen, Junfeng Zhang, Weihua Zhang, Huaiqiaing Sun, Yang Lv, Peixi Liao, Jiliu Zhou, Ge Wang

Abstract: Compressive sensing (CS) has proved effective for tomographic reconstruction from sparsely collected data or under-sampled measurements, which are practically important for few-view CT, tomosynthesis, interior tomography, and so on. To perform sparse-data CT, the iterative reconstruction commonly use regularizers in the CS framework. Currently, how to choose the parameters adaptively for regulariz… ▽ More Compressive sensing (CS) has proved effective for tomographic reconstruction from sparsely collected data or under-sampled measurements, which are practically important for few-view CT, tomosynthesis, interior tomography, and so on. To perform sparse-data CT, the iterative reconstruction commonly use regularizers in the CS framework. Currently, how to choose the parameters adaptively for regularization is a major open problem. In this paper, inspired by the idea of machine learning especially deep learning, we unfold a state-of-the-art "fields of experts" based iterative reconstruction scheme up to a number of iterations for data-driven training, construct a Learned Experts' Assessment-based Reconstruction Network ("LEARN") for sparse-data CT, and demonstrate the feasibility and merits of our LEARN network. The experimental results with our proposed LEARN network produces a competitive performance with the well-known Mayo Clinic Low-Dose Challenge Dataset relative to several state-of-the-art methods, in terms of artifact reduction, feature preservation, and computational speed. This is consistent to our insight that because all the regularization terms and parameters used in the iterative reconstruction are now learned from the training data, our LEARN network utilizes application-oriented knowledge more effectively and recovers underlying images more favorably than competing algorithms. Also, the number of layers in the LEARN network is only 12, reducing the computational complexity of typical iterative algorithms by orders of magnitude. △ Less

Submitted 10 February, 2018; v1 submitted 30 July, 2017; originally announced July 2017.

Comments: 18 pages, 15 figures, accepted by IEEE TMI

arXiv:1704.04866 [pdf, ps, other]

Effective Warm Start for the Online Actor-Critic Reinforcement Learning based mHealth Intervention

Authors: Feiyun Zhu, Peng Liao

Abstract: Online reinforcement learning (RL) is increasingly popular for the personalized mobile health (mHealth) intervention. It is able to personalize the type and dose of interventions according to user's ongoing statuses and changing needs. However, at the beginning of online learning, there are usually too few samples to support the RL updating, which leads to poor performances. A delay in good perfor… ▽ More Online reinforcement learning (RL) is increasingly popular for the personalized mobile health (mHealth) intervention. It is able to personalize the type and dose of interventions according to user's ongoing statuses and changing needs. However, at the beginning of online learning, there are usually too few samples to support the RL updating, which leads to poor performances. A delay in good performance of the online learning algorithms can be especially detrimental in the mHealth, where users tend to quickly disengage with the mHealth app. To address this problem, we propose a new online RL methodology that focuses on an effective warm start. The main idea is to make full use of the data accumulated and the decision rule achieved in a former study. As a result, we can greatly enrich the data size at the beginning of online learning in our method. Such case accelerates the online learning process for new users to achieve good performances not only at the beginning of online learning but also through the whole online learning process. Besides, we use the decision rules achieved in a previous study to initialize the parameter in our online RL model for new users. It provides a good initialization for the proposed online RL algorithm. Experiment results show that promising improvements have been achieved by our method compared with the state-of-the-art method. △ Less

Submitted 21 May, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

arXiv:1703.10039 [pdf, other]

Cohesion-based Online Actor-Critic Reinforcement Learning for mHealth Intervention

Authors: Feiyun Zhu, Peng Liao, Xinliang Zhu, Yaowen Yao, Junzhou Huang

Abstract: In the wake of the vast population of smart device users worldwide, mobile health (mHealth) technologies are hopeful to generate positive and wide influence on people's health. They are able to provide flexible, affordable and portable health guides to device users. Current online decision-making methods for mHealth assume that the users are completely heterogeneous. They share no information amon… ▽ More In the wake of the vast population of smart device users worldwide, mobile health (mHealth) technologies are hopeful to generate positive and wide influence on people's health. They are able to provide flexible, affordable and portable health guides to device users. Current online decision-making methods for mHealth assume that the users are completely heterogeneous. They share no information among users and learn a separate policy for each user. However, data for each user is very limited in size to support the separate online learning, leading to unstable policies that contain lots of variances. Besides, we find the truth that a user may be similar with some, but not all, users, and connected users tend to have similar behaviors. In this paper, we propose a network cohesion constrained (actor-critic) Reinforcement Learning (RL) method for mHealth. The goal is to explore how to share information among similar users to better convert the limited user information into sharper learned policies. To the best of our knowledge, this is the first online actor-critic RL for mHealth and first network cohesion constrained (actor-critic) RL method in all applications. The network cohesion is important to derive effective policies. We come up with a novel method to learn the network by using the warm start trajectory, which directly reflects the users' property. The optimization of our model is difficult and very different from the general supervised learning due to the indirect observation of values. As a contribution, we propose two algorithms for the proposed online RLs. Apart from mHealth, the proposed methods can be easily applied or adapted to other health-related tasks. Extensive experiment results on the HeartSteps dataset demonstrates that in a variety of parameter settings, the proposed two methods obtain obvious improvements over the state-of-the-art methods. △ Less

Submitted 23 August, 2017; v1 submitted 25 March, 2017; originally announced March 2017.

arXiv:1702.00288 [pdf]

doi 10.1109/TMI.2017.2715284

Low-Dose CT with a Residual Encoder-Decoder Convolutional Neural Network (RED-CNN)

Authors: Hu Chen, Yi Zhang, Mannudeep K. Kalra, Feng Lin, Yang Chen, Peixi Liao, Jiliu Zhou, Ge Wang

Abstract: Given the potential X-ray radiation risk to the patient, low-dose CT has attracted a considerable interest in the medical imaging field. The current main stream low-dose CT methods include vendor-specific sinogram domain filtration and iterative reconstruction, but they need to access original raw data whose formats are not transparent to most users. Due to the difficulty of modeling the statistic… ▽ More Given the potential X-ray radiation risk to the patient, low-dose CT has attracted a considerable interest in the medical imaging field. The current main stream low-dose CT methods include vendor-specific sinogram domain filtration and iterative reconstruction, but they need to access original raw data whose formats are not transparent to most users. Due to the difficulty of modeling the statistical characteristics in the image domain, the existing methods for directly processing reconstructed images cannot eliminate image noise very well while kee** structural details. Inspired by the idea of deep learning, here we combine the autoencoder, the deconvolution network, and shortcut connections into the residual encoder-decoder convolutional neural network (RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression, structural preservation and lesion detection. △ Less

Submitted 11 June, 2017; v1 submitted 1 February, 2017; originally announced February 2017.

Comments: Accepted by IEEE TMI

arXiv:1610.00321 [pdf]

Low-dose CT denoising with convolutional neural network

Authors: Hu Chen, Yi Zhang, Weihua Zhang, Peixi Liao, Ke Li, Jiliu Zhou, Ge Wang

Abstract: To reduce the potential radiation risk, low-dose CT has attracted much attention. However, simply lowering the radiation dose will lead to significant deterioration of the image quality. In this paper, we propose a noise reduction method for low-dose CT via deep neural network without accessing original projection data. A deep convolutional neural network is trained to transform low-dose CT images… ▽ More To reduce the potential radiation risk, low-dose CT has attracted much attention. However, simply lowering the radiation dose will lead to significant deterioration of the image quality. In this paper, we propose a noise reduction method for low-dose CT via deep neural network without accessing original projection data. A deep convolutional neural network is trained to transform low-dose CT images towards normal-dose CT images, patch by patch. Visual and quantitative evaluation demonstrates a competing performance of the proposed method. △ Less

Submitted 2 October, 2016; originally announced October 2016.

Comments: arXiv admin note: substantial text overlap with arXiv:1609.08508

Showing 1–44 of 44 results for author: Liao, P