Search | arXiv e-print repository

arXiv:2403.19949 [pdf, other]

FairCLIP: Harnessing Fairness in Vision-Language Learning

Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair vision-language medical dataset Harvard-FairVLMed that provides detailed demographic attributes, ground-truth labels, and clinical notes to facilitate an in-depth examination of fairness within VL foundation models. Using Harvard-FairVLMed, we conduct a comprehensive fairness analysis of two widely-used VL models (CLIP and BLIP2), pre-trained on both natural and medical domains, across four different protected attributes. Our results highlight significant biases in all VL models, with Asian, Male, Non-Hispanic, and Spanish being the preferred subgroups across the protected attributes of race, gender, ethnicity, and language, respectively. In order to alleviate these biases, we propose FairCLIP, an optimal-transport-based approach that achieves a favorable trade-off between performance and fairness by reducing the Sinkhorn distance between the overall sample distribution and the distributions corresponding to each demographic group. As the first VL dataset of its kind, Harvard-FairVLMed holds the potential to catalyze advancements in the development of machine learning models that are both ethically aware and clinically effective. Our dataset and code are available at https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k. △ Less

Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2312.08656 [pdf, other]

MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

Authors: Hongwu Peng, Xi Xie, Kaustubh Shivdikar, MD Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David Kaeli, Caiwen Ding

Abstract: In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue t… ▽ More In the acceleration of deep neural network training, the GPU has become the mainstream platform. GPUs face substantial challenges on GNNs, such as workload imbalance and memory access irregularities, leading to underutilized hardware. Existing solutions such as PyG, DGL with cuSPARSE, and GNNAdvisor frameworks partially address these challenges but memory traffic is still significant. We argue that drastic performance improvements can only be achieved by the vertical optimization of algorithm and system innovations, rather than treating the speedup optimization as an "after-thought" (i.e., (i) given a GNN algorithm, designing an accelerator, or (ii) given hardware, mainly optimizing the GNN algorithm). In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation. (i) We introduce the MaxK nonlinearity and provide a theoretical analysis of MaxK nonlinearity as a universal approximator, and present the Compressed Balanced Sparse Row (CBSR) format, designed to store the data and index of the feature matrix after nonlinearity; (ii) We design a coalescing enhanced forward computation with row-wise product-based SpGEMM Kernel using CBSR for input feature matrix fetching and strategic placement of a sparse output accumulation buffer in shared memory; (iii) We develop an optimized backward computation with outer product-based and SSpMM Kernel. We conduct extensive evaluations of MaxK-GNN and report the end-to-end system run-time. Experiments show that MaxK-GNN system could approach the theoretical speedup limit according to Amdahl's law. We achieve comparable accuracy to SOTA GNNs, but at a significantly increased speed: 3.22/4.24 times speedup (vs. theoretical limits, 5.52/7.27 times) on Reddit compared to DGL and GNNAdvisor implementations. △ Less

Submitted 18 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: ASPLOS 2024 accepted publication

ACM Class: I.2; C.5

arXiv:2311.09762 [pdf, other]

Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models

Authors: **young Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, Joo-Kyung Kim

Abstract: Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-q… ▽ More Chain-of-Thought (CoT) prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities of Large Language Models (LLMs). However, prompting the LLMs to directly generate sub-questions is suboptimal since they sometimes generate redundant or irrelevant questions. To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers. Concretely, given an input question, we first prompt the LLM to generate knowledge triplets, forming a graph representation of the question. Unlike conventional knowledge triplets, our approach allows variables as head or tail entities, effectively representing a question as knowledge triplets. Second, for each triplet, the LLM generates a corresponding sub-question and answer along with using knowledge retrieval. If the prediction confidence exceeds a threshold, the sub-question and prediction are incorporated into the prompt for subsequent processing. This approach encourages that sub-questions are grounded in the extracted knowledge triplets, reducing redundancy and irrelevance. Our experiments demonstrate that our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets. △ Less

Submitted 22 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Preprint

arXiv:2311.06573 [pdf, other]

A Generalized Space-Efficient Algorithm for Quantum Bit String Comparators

Authors: Khuram Shahzad, Omar Usman Khan

Abstract: Quantum Bit String Comparators (QBSC) operate on two sequences of n-qubits, enabling the determination of their relationships, such as equality, greater than, or less than. This is analogous to the way conditional statements are used in programming languages. Consequently, QBSCs play a crucial role in various algorithms that can be executed or adapted for quantum computers. The development of effi… ▽ More Quantum Bit String Comparators (QBSC) operate on two sequences of n-qubits, enabling the determination of their relationships, such as equality, greater than, or less than. This is analogous to the way conditional statements are used in programming languages. Consequently, QBSCs play a crucial role in various algorithms that can be executed or adapted for quantum computers. The development of efficient and generalized comparators for any $n$-qubit length has long posed a challenge, as they have a high-cost footprint and lead to quantum delays. Comparators that are efficient are associated with inputs of fixed length. As a result, comparators without a generalized circuit cannot be employed at a higher level, though they are well-suited for problems with limited size requirements. In this paper, we introduce a generalized design for the comparison of two $n$-qubit logic states using just two ancillary bits. The design is examined on the basis of qubit requirements, ancillary bit usage, quantum cost, quantum delay, gate operations, and circuit complexity, and is tested comprehensively on various input lengths. The work allows for sufficient flexibility in the design of quantum algorithms, which can accelerate quantum algorithm development. △ Less

Submitted 14 November, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2310.20081 [pdf, other]

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Authors: Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, Abhinav Sethy

Abstract: Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data… ▽ More Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 4 pages, International Workshop on Personalized Generative AI (@CIKM 2023)

ACM Class: I.2.7; H.3.3

arXiv:2310.04551 [pdf, other]

MeSa: Masked, Geometric, and Supervised Pre-training for Monocular Depth Estimation

Authors: Muhammad Osama Khan, Junbang Liang, Chun-Kai Wang, Shan Yang, Yu Lou

Abstract: Pre-training has been an important ingredient in develo** strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By exam… ▽ More Pre-training has been an important ingredient in develo** strong monocular depth estimation models in recent years. For instance, self-supervised learning (SSL) is particularly effective by alleviating the need for large datasets with dense ground-truth depth maps. However, despite these improvements, our study reveals that the later layers of the SOTA SSL method are actually suboptimal. By examining the layer-wise representations, we demonstrate significant changes in these later layers during fine-tuning, indicating the ineffectiveness of their pre-trained features for depth estimation. To address these limitations, we propose MeSa, a comprehensive framework that leverages the complementary strengths of masked, geometric, and supervised pre-training. Hence, MeSa benefits from not only general-purpose representations learnt via masked pre training but also specialized depth-specific features acquired via geometric and supervised pre-training. Our CKA layer-wise analysis confirms that our pre-training strategy indeed produces improved representations for the later layers, overcoming the drawbacks of the SOTA SSL method. Furthermore, via experiments on the NYUv2 and IBims-1 datasets, we demonstrate that these enhanced representations translate to performance improvements in both the in-distribution and out-of-distribution settings. We also investigate the influence of the pre-training dataset and demonstrate the efficacy of pre-training on LSUN, which yields significantly better pre-trained representations. Overall, our approach surpasses the masked pre-training SSL method by a substantial margin of 17.1% on the RMSE. Moreover, even without utilizing any recently proposed techniques, MeSa also outperforms the most recent methods and establishes a new state-of-the-art for monocular depth estimation on the challenging NYUv2 dataset. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.02492 [pdf, other]

FairVision: Equitable Deep Learning for Eye Disease Screening via Fair Identity Scaling

Authors: Yan Luo, Muhammad Osama Khan, Yu Tian, Min Shi, Zehao Dou, Tobias Elze, Yi Fang, Mengyu Wang

Abstract: Equity in AI for healthcare is crucial due to its direct impact on human well-being. Despite advancements in 2D medical imaging fairness, the fairness of 3D models remains underexplored, hindered by the small sizes of 3D fairness datasets. Since 3D imaging surpasses 2D imaging in SOTA clinical care, it is critical to understand the fairness of these 3D models. To address this research gap, we cond… ▽ More Equity in AI for healthcare is crucial due to its direct impact on human well-being. Despite advancements in 2D medical imaging fairness, the fairness of 3D models remains underexplored, hindered by the small sizes of 3D fairness datasets. Since 3D imaging surpasses 2D imaging in SOTA clinical care, it is critical to understand the fairness of these 3D models. To address this research gap, we conduct the first comprehensive study on the fairness of 3D medical imaging models across multiple protected attributes. Our investigation spans both 2D and 3D models and evaluates fairness across five architectures on three common eye diseases, revealing significant biases across race, gender, and ethnicity. To alleviate these biases, we propose a novel fair identity scaling (FIS) method that improves both overall performance and fairness, outperforming various SOTA fairness methods. Moreover, we release Harvard-FairVision, the first large-scale medical fairness dataset with 30,000 subjects featuring both 2D and 3D imaging data and six demographic identity attributes. Harvard-FairVision provides labels for three major eye disorders affecting about 380 million people worldwide, serving as a valuable resource for both 2D and 3D fairness learning. Our code and dataset are publicly accessible at \url{https://ophai.hms.harvard.edu/datasets/harvard-fairvision30k}. △ Less

Submitted 12 April, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2308.11825 [pdf, other]

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

Authors: Xi Xie, Hongwu Peng, Amit Hasan, Shaoyi Huang, Jiahui Zhao, Haowen Fang, Wei Zhang, Tong Geng, Omer Khan, Caiwen Ding

Abstract: Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To address these challenges, we present Accel-GCN, a GPU accelerator architecture for GCNs. The design of Accel-GCN encompasses: (i) a lightweight degree sorting stage t… ▽ More Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To address these challenges, we present Accel-GCN, a GPU accelerator architecture for GCNs. The design of Accel-GCN encompasses: (i) a lightweight degree sorting stage to group nodes with similar degree; (ii) a block-level partition strategy that dynamically adjusts warp workload sizes, enhancing shared memory locality and workload balance, and reducing metadata overhead compared to designs like GNNAdvisor; (iii) a combined warp strategy that improves memory coalescing and computational parallelism in the column dimension of dense matrices. Utilizing these principles, we formulated a kernel for sparse matrix multiplication (SpMM) in GCNs that employs block-level partitioning and combined warp strategy. This approach augments performance and multi-level memory efficiency and optimizes memory bandwidth by exploiting memory coalescing and alignment. Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively. The results underscore Accel-GCN as an effective solution for enhancing GCN computational efficiency. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: ICCAD 2023 accepted publication

ACM Class: I.2; B.6; C.3

arXiv:2307.10915 [pdf, other]

Revisiting Fine-Tuning Strategies for Self-supervised Medical Imaging Analysis

Authors: Muhammad Osama Khan, Yi Fang

Abstract: Despite the rapid progress in self-supervised learning (SSL), end-to-end fine-tuning still remains the dominant fine-tuning strategy for medical imaging analysis. However, it remains unclear whether this approach is truly optimal for effectively utilizing the pre-trained knowledge, especially considering the diverse categories of SSL that capture different types of features. In this paper, we pres… ▽ More Despite the rapid progress in self-supervised learning (SSL), end-to-end fine-tuning still remains the dominant fine-tuning strategy for medical imaging analysis. However, it remains unclear whether this approach is truly optimal for effectively utilizing the pre-trained knowledge, especially considering the diverse categories of SSL that capture different types of features. In this paper, we present the first comprehensive study that discovers effective fine-tuning strategies for self-supervised learning in medical imaging. After develo** strong contrastive and restorative SSL baselines that outperform SOTA methods across four diverse downstream tasks, we conduct an extensive fine-tuning analysis across multiple pre-training and fine-tuning datasets, as well as various fine-tuning dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last few layers of a pre-trained network, we show that fine-tuning intermediate layers is more effective, with fine-tuning the second quarter (25-50%) of the network being optimal for contrastive SSL whereas fine-tuning the third quarter (50-75%) of the network being optimal for restorative SSL. Compared to the de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy, which fine-tunes a shallower network consisting of the first three quarters (0-75%) of the pre-trained network, yields improvements of as much as 5.48%. Additionally, using these insights, we propose a simple yet effective method to leverage the complementary strengths of multiple SSL models, resulting in enhancements of up to 3.57% compared to using the best model alone. Hence, our fine-tuning strategies not only enhance the performance of individual SSL models, but also enable effective utilization of the complementary strengths offered by multiple SSL models, leading to significant improvements in self-supervised medical imaging analysis. △ Less

Submitted 16 November, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 18 pages

arXiv:2305.14218 [pdf, other]

DUBLIN -- Document Understanding By Language-Image Network

Authors: Kriti Aggarwal, Aditi Khandelwal, Kumar Tanmay, Owais Mohammed Khan, Qiang Liu, Monojit Choudhury, Hardik Hansrajbhai Chauhan, Subhojit Som, Vishrav Chaudhary, Saurabh Tiwary

Abstract: Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives… ▽ More Visual document understanding is a complex task that involves analyzing both the text and the visual elements in document images. Existing models often rely on manual feature engineering or domain-specific pipelines, which limit their generalization ability across different document types and languages. In this paper, we propose DUBLIN, which is pretrained on web pages using three novel objectives: Masked Document Text Generation Task, Bounding Box Task, and Rendered Question Answering Task, that leverage both the spatial and semantic information in the document images. Our model achieves competitive or state-of-the-art results on several benchmarks, such as Web-Based Structural Reading Comprehension, Document Visual Question Answering, Key Information Extraction, Diagram Understanding, and Table Question Answering. In particular, we show that DUBLIN is the first pixel-based model to achieve an EM of 77.75 and F1 of 84.25 on the WebSRC dataset. We also show that our model outperforms the current pixel-based SOTA models on DocVQA, InfographicsVQA, OCR-VQA and AI2D datasets by 4.6%, 6.5%, 2.6% and 21%, respectively. We also achieve competitive performance on RVL-CDIP document classification. Moreover, we create new baselines for text-based datasets by rendering them as document images to promote research in this direction. △ Less

Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

ACM Class: F.2.2; I.2.7

arXiv:2302.10978 [pdf, other]

Learning to Retrieve Engaging Follow-Up Queries

Authors: Christopher Richardson, Sudipta Kar, Anjishnu Kumar, Anand Ramachandran, Omar Zia Khan, Zeynab Raeesy, Abhinav Sethy

Abstract: Open domain conversational agents can answer a broad range of targeted queries. However, the sequential nature of interaction with these systems makes knowledge exploration a lengthy task which burdens the user with asking a chain of well phrased questions. In this paper, we present a retrieval based system and associated dataset for predicting the next questions that the user might have. Such a s… ▽ More Open domain conversational agents can answer a broad range of targeted queries. However, the sequential nature of interaction with these systems makes knowledge exploration a lengthy task which burdens the user with asking a chain of well phrased questions. In this paper, we present a retrieval based system and associated dataset for predicting the next questions that the user might have. Such a system can proactively assist users in knowledge exploration leading to a more engaging dialog. The retrieval system is trained on a dataset which contains ~14K multi-turn information-seeking conversations with a valid follow-up question and a set of invalid candidates. The invalid candidates are generated to simulate various syntactic and semantic confounders such as paraphrases, partial entity match, irrelevant entity, and ASR errors. We use confounder specific techniques to simulate these negative examples on the OR-QuAC dataset and develop a dataset called the Follow-up Query Bank (FQ-Bank). Then, we train ranking models on FQ-Bank and present results comparing supervised and unsupervised approaches. The results suggest that we can retrieve the valid follow-ups by ranking them in higher positions compared to confounders, but further knowledge grounding can improve ranking performance. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: EACL 2023

arXiv:2210.04114 [pdf, other]

Towards Real-Time Temporal Graph Learning

Authors: Deniz Gurevin, Mohsin Shan, Tong Geng, Weiwen Jiang, Caiwen Ding, Omer Khan

Abstract: In recent years, graph representation learning has gained significant popularity, which aims to generate node embeddings that capture features of graphs. One of the methods to achieve this is employing a technique called random walks that captures node sequences in a graph and then learns embeddings for each node using a natural language processing technique called Word2Vec. These embeddings are t… ▽ More In recent years, graph representation learning has gained significant popularity, which aims to generate node embeddings that capture features of graphs. One of the methods to achieve this is employing a technique called random walks that captures node sequences in a graph and then learns embeddings for each node using a natural language processing technique called Word2Vec. These embeddings are then used for deep learning on graph data for classification tasks, such as link prediction or node classification. Prior work operates on pre-collected temporal graph data and is not designed to handle updates on a graph in real-time. Real world graphs change dynamically and their entire temporal updates are not available upfront. In this paper, we propose an end-to-end graph learning pipeline that performs temporal graph construction, creates low-dimensional node embeddings, and trains multi-layer neural network models in an online setting. The training of the neural network models is identified as the main performance bottleneck as it performs repeated matrix operations on many sequentially connected low-dimensional kernels. We propose to unlock fine-grain parallelism in these low-dimensional kernels to boost performance of model training. △ Less

Submitted 11 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

arXiv:2209.04766 [pdf, other]

Towards Sparsification of Graph Neural Networks

Authors: Hongwu Peng, Deniz Gurevin, Shaoyi Huang, Tong Geng, Weiwen Jiang, Omer Khan, Caiwen Ding

Abstract: As real-world graphs expand in size, larger GNN models with billions of parameters are deployed. High parameter count in such models makes training and inference on graphs expensive and challenging. To reduce the computational and memory costs of GNNs, optimization methods such as pruning the redundant nodes and edges in input graphs have been commonly adopted. However, model compression, which di… ▽ More As real-world graphs expand in size, larger GNN models with billions of parameters are deployed. High parameter count in such models makes training and inference on graphs expensive and challenging. To reduce the computational and memory costs of GNNs, optimization methods such as pruning the redundant nodes and edges in input graphs have been commonly adopted. However, model compression, which directly targets the sparsification of model layers, has been mostly limited to traditional Deep Neural Networks (DNNs) used for tasks such as image classification and object detection. In this paper, we utilize two state-of-the-art model compression methods (1) train and prune and (2) sparse training for the sparsification of weight layers in GNNs. We evaluate and compare the efficiency of both methods in terms of accuracy, training sparsity, and training FLOPs on real-world graphs. Our experimental results show that on the ia-email, wiki-talk, and stackoverflow datasets for link prediction, sparse training with much lower training FLOPs achieves a comparable accuracy with the train and prune method. On the brain dataset for node classification, sparse training uses a lower number FLOPs (less than 1/7 FLOPs of train and prune method) and preserves a much better accuracy performance under extreme model sparsity. △ Less

Submitted 24 February, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

Comments: ICCD 2022 Paper

ACM Class: I.2; C.4

arXiv:2201.01834 [pdf, other]

Secure Remote Attestation with Strong Key Insulation Guarantees

Authors: Deniz Gurevin, Chenglu **, Phuong Ha Nguyen, Omer Khan, Marten van Dijk

Abstract: Recent years have witnessed a trend of secure processor design in both academia and industry. Secure processors with hardware-enforced isolation can be a solid foundation of cloud computation in the future. However, due to recent side-channel attacks, the commercial secure processors failed to deliver the promises of a secure isolated execution environment. Sensitive information inside the secure… ▽ More Recent years have witnessed a trend of secure processor design in both academia and industry. Secure processors with hardware-enforced isolation can be a solid foundation of cloud computation in the future. However, due to recent side-channel attacks, the commercial secure processors failed to deliver the promises of a secure isolated execution environment. Sensitive information inside the secure execution environment always gets leaked via side channels. This work considers the most powerful software-based side-channel attackers, i.e., an All Digital State Observing (ADSO) adversary who can observe all digital states, including all digital states in secure enclaves. Traditional signature schemes are not secure in ADSO adversarial model. We introduce a new cryptographic primitive called One-Time Signature with Secret Key Exposure (OTS-SKE), which ensures no one can forge a valid signature of a new message or nonce even if all secret session keys are leaked. OTS-SKE enables us to sign attestation reports securely under the ADSO adversary. We also minimize the trusted computing base by introducing a secure co-processor into the system, and the interaction between the secure co-processor and the attestation processor is unidirectional. That is, the co-processor takes no inputs from the processor and only generates secret keys for the processor to fetch. Our experimental results show that the signing of OTS-SKE is faster than that of Elliptic Curve Digital Signature Algorithm (ECDSA) used in Intel SGX. △ Less

Submitted 5 January, 2022; originally announced January 2022.

arXiv:1906.09850 [pdf]

Multisensory cues facilitate coordination of step** movements with a virtual reality avatar

Authors: Omar Khan, Imran Ahmed, Joshua Cottingham, Musa Rahhal, Theodoros N Arvanitis, Mark Elliott

Abstract: The effectiveness of simple sensory cues for retraining gait have been demonstrated, yet the feasibility of humanoid avatars for entrainment have yet to be investigated. Here, we describe the development of a novel method of visually cued training, in the form of a virtual partner, and investigate its ability to provide movement guidance in the form of step**. Real step** movements were mapped… ▽ More The effectiveness of simple sensory cues for retraining gait have been demonstrated, yet the feasibility of humanoid avatars for entrainment have yet to be investigated. Here, we describe the development of a novel method of visually cued training, in the form of a virtual partner, and investigate its ability to provide movement guidance in the form of step**. Real step** movements were mapped onto an avatar using motion capture data. The trajectory of one of the avatar step cycles was then accelerated or decelerated by 15% to create a perturbation. Healthy participants were motion captured while instructed to step in time to the avatar's movements, as viewed through a virtual reality headset. Step onset times were used to measure the timing errors (asynchronies) between them. Participants completed either a visual-only condition, or auditory-visual with footstep sounds included. Participants' asynchronies exhibited slow drift in the Visual-Only condition, but became stable in the Auditory-Visual condition. Moreover, we observed a clear corrective response to the phase perturbation in both auditory-visual conditions. We conclude that an avatar's movements can be used to influence a person's own gait, but should include relevant auditory cues congruent with the movement to ensure a suitable accuracy is achieved. △ Less

Submitted 24 June, 2019; originally announced June 2019.

Comments: 28 pages, 8 figures, submitted to PLOS ONE

arXiv:1904.12729 [pdf]

IRONHIDE: A Secure Multicore that Efficiently Mitigates Microarchitecture State Attacks for Interactive Applications

Authors: Hamza Omar, Omer Khan

Abstract: Microprocessors enable aggressive hardware virtualization by means of which multiple processes temporally execute on the system. These security-critical and ordinary processes interact with each other to assure application progress. However, temporal sharing of hardware resources exposes the processor to various microarchitecture state attacks. State-of-the-art secure processors, such as MI6 adopt… ▽ More Microprocessors enable aggressive hardware virtualization by means of which multiple processes temporally execute on the system. These security-critical and ordinary processes interact with each other to assure application progress. However, temporal sharing of hardware resources exposes the processor to various microarchitecture state attacks. State-of-the-art secure processors, such as MI6 adopt Intel's SGX enclave execution model. MI6 architects strong isolation by statically isolating shared memory state, and purging the microarchitecture state of private core, cache, and TLB resources on every enclave entry and exit. The purging overhead significantly impacts performance as the interactivity across the secure and insecure processes increases. This paper proposes IRONHIDE that implements strong isolation in the context of multicores to form spatially isolated secure and insecure clusters of cores. For an interactive application comprising of secure and insecure processes, IRONHIDE pins the secure process(es) to the secure cluster, where they execute and interact with the insecure process(es) without incurring the microarchitecture state purging overheads on every interaction event. IRONHIDE improves performance by 2.1x over the MI6 baseline for a set of user and OS interactive applications. Moreover, IRONHIDE improves performance by 20% over an SGX-like baseline, while also ensuring strong isolation guarantees against microarchitecture state attacks. △ Less

Submitted 27 January, 2020; v1 submitted 29 April, 2019; originally announced April 2019.

arXiv:1904.08689 [pdf, other]

Exquisitor: Interactive Learning at Large

Authors: Björn Þór Jónsson, Omar Shahbaz Khan, Hanna Ragnarsdóttir, Þórhildur Þorleiksdóttir, Jan Zahálka, Stevan Rudinac, Gylfi Þór Guðmundsson, Laurent Amsaleg, Marcel Worring

Abstract: Increasing scale is a dominant trend in today's multimedia collections, which especially impacts interactive applications. To facilitate interactive exploration of large multimedia collections, new approaches are needed that are capable of learning on the fly new analytic categories based on the visual and textual content. To facilitate general use on standard desktops, laptops, and mobile devices… ▽ More Increasing scale is a dominant trend in today's multimedia collections, which especially impacts interactive applications. To facilitate interactive exploration of large multimedia collections, new approaches are needed that are capable of learning on the fly new analytic categories based on the visual and textual content. To facilitate general use on standard desktops, laptops, and mobile devices, they must furthermore work with limited computing resources. We present Exquisitor, a highly scalable interactive learning approach, capable of intelligent exploration of the large-scale YFCC100M image collection with extremely efficient responses from the interactive classifier. Based on relevance feedback from the user on previously suggested items, Exquisitor uses semantic features, extracted from both visual and text attributes, to suggest relevant media items to the user. Exquisitor builds upon the state of the art in large-scale data representation, compression and indexing, introducing a cluster-based retrieval mechanism that facilitates the efficient suggestions. With Exquisitor, each interaction round over the full YFCC100M collection is completed in less than 0.3 seconds using a single CPU core. That is 4x less time using 16x smaller computational resources than the most efficient state-of-the-art method, with a positive impact on result quality. These results open up many interesting research avenues, both for exploration of industry-scale media collections and for media exploration on mobile devices. △ Less

Submitted 17 July, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

arXiv:1904.08222 [pdf, other]

Experimental Clock Calibration\\on a Crystal-Free Mote-on-a-Chip

Authors: Ioana Suciu, Filip Maksimovic, David Burnett, Osama Khan, Brad Wheeler, Arvind Sundararajan, Thomas Watteyne, Xavier Vilajosana, Kris Pister

Abstract: The elimination of the off-chip frequency reference, typically a crystal oscillator, would bring important benefits in terms of size, price and energy efficiency to IEEE802.15.4 compliant radios and systems-on-chip. The stability of on-chip oscillators is orders of magnitude worse than that of a crystal. It is known that as the temperature changes, they can drift more than 50 ppm/°C. This paper pr… ▽ More The elimination of the off-chip frequency reference, typically a crystal oscillator, would bring important benefits in terms of size, price and energy efficiency to IEEE802.15.4 compliant radios and systems-on-chip. The stability of on-chip oscillators is orders of magnitude worse than that of a crystal. It is known that as the temperature changes, they can drift more than 50 ppm/°C. This paper presents the result of an extensive experimental study. First, we propose mechanisms for crystal-free radios to be able to track an IEEE802.15.4 join proxy, calibrate the on-chip oscillators and maintain calibration against temperature changes. Then, we implement the resulting algorithms on a crystal-free platform and present the results of an experimental validation. We show that our approach is able to track a crystal-based IEEE802.15.4-compliant join proxy and maintain the requested radio frequency stability of +/-40 ppm, even when subject to temperature variation of 2°C/min. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: CNERT: Computer and Networking Experimental Research using Testbeds, in conjunction with IEEE INFOCOM 2019, April 29 - May 2, 2019, Paris, France

arXiv:1807.09193 [pdf, other]

GRAINS: Generative Recursive Autoencoders for INdoor Scenes

Authors: Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, Hao Zhang

Abstract: We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional; it is a recursive neural network or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recu… ▽ More We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently. Our key observation is that indoor scene structures are inherently hierarchical. Hence, our network is not convolutional; it is a recursive neural network or RvNN. Using a dataset of annotated scene hierarchies, we train a variational recursive autoencoder, or RvNN-VAE, which performs scene object grou** during its encoding phase and scene generation during decoding. Specifically, a set of encoders are recursively applied to group 3D objects based on support, surround, and co-occurrence relations in a scene, encoding information about object spatial properties, semantics, and their relative positioning with respect to other objects in the hierarchy. By training a variational autoencoder (VAE), the resulting fixed-length codes roughly follow a Gaussian distribution. A novel 3D scene can be generated hierarchically by the decoder from a randomly sampled code from the learned distribution. We coin our method GRAINS, for Generative Recursive Autoencoders for INdoor Scenes. We demonstrate the capability of GRAINS to generate plausible and diverse 3D indoor scenes and compare with existing methods for 3D scene synthesis. We show applications of GRAINS including 3D scene modeling from 2D layouts, scene editing, and semantic scene segmentation via PointNet whose performance is boosted by the large quantity and variety of 3D scenes generated by our method. △ Less

Submitted 8 May, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

Comments: 21 pages, 26 figures

arXiv:1707.02589 [pdf, other]

Exploiting the Tradeoff between Program Accuracy and Soft-error Resiliency Overhead for Machine Learning Workloads

Authors: Qingchuan Shi, Hamza Omar, Omer Khan

Abstract: To protect multicores from soft-error perturbations, resiliency schemes have been developed with high coverage but high power and performance overheads. Emerging safety-critical machine learning applications are increasingly being deployed on these platforms. Moreover, these systems are exposed to harsh environments, such as unmanned aerial vehicles (UAVs) and self-driving cars. Due to the unique… ▽ More To protect multicores from soft-error perturbations, resiliency schemes have been developed with high coverage but high power and performance overheads. Emerging safety-critical machine learning applications are increasingly being deployed on these platforms. Moreover, these systems are exposed to harsh environments, such as unmanned aerial vehicles (UAVs) and self-driving cars. Due to the unique structure and computational behavior of such applications, research has been done on relaxing their accuracy for performance benefits. We observe that not all transient errors affect program correctness, some errors only affect program accuracy, i.e., the program completes with certain acceptable deviations from error free outcome. This paper illustrates the idea of cross-layer soft-error resilience using machine learning workloads, where program accuracy is introduced as a tradeoff to deliver resilient yet efficient execution on futuristic large-scale multicores. △ Less

Submitted 9 July, 2017; originally announced July 2017.

Comments: Presented in 2017 IEEE Workshop on Silicon Errors in Logic - System Effects

arXiv:1706.03852 [pdf, other]

Revisiting Definitional Foundations of Oblivious RAM for Secure Processor Implementations

Authors: Syed Kamran Haider, Omer Khan, Marten van Dijk

Abstract: Oblivious RAM (ORAM) is a renowned technique to hide the access patterns of an application to an untrusted memory. According to the standard ORAM definition presented by Goldreich and Ostrovsky, two ORAM access sequences must be computationally indistinguishable if the lengths of these sequences are identically distributed. An artifact of this definition is that it does not apply to modern ORAM im… ▽ More Oblivious RAM (ORAM) is a renowned technique to hide the access patterns of an application to an untrusted memory. According to the standard ORAM definition presented by Goldreich and Ostrovsky, two ORAM access sequences must be computationally indistinguishable if the lengths of these sequences are identically distributed. An artifact of this definition is that it does not apply to modern ORAM implementations adapted in current secure processors technology because of their arbitrary lengths of memory access sequences depending on programs' behaviors (their termination times). As a result, the ORAM definition does not directly apply; the theoretical foundations of ORAM do not clearly argue about the timing and termination channels. This paper conducts a first rigorous study of the standard Goldreich-Ostrovsky ORAM definition in view of modern practical ORAMs (e.g., Path ORAM) and demonstrates the gap between theoretical foundations and real implementations. A new ORAM formulation which clearly separates out termination channel leakage is proposed. It is shown how this definition implies the standard ORAM definition (for finite length input access sequences) and better fits the modern practical ORAM implementations. The proposed definition relaxes the constraints around the stash size and overflow probability for Path ORAM, and essentially transforms its security argument into a performance consideration problem. Finally, a `strong' ORAM formulation which clearly includes obfuscation of termination leakage is shown to imply our new ORAM formulation and applies to ORAM for outsourced disk storage. In this strong formulation constraints are not relaxed and the security argument for Path ORAM remains complex as one needs to prove that the stash overflows with negligible probability. △ Less

Submitted 21 October, 2017; v1 submitted 12 June, 2017; originally announced June 2017.

Showing 1–21 of 21 results for author: Khan, O