Search | arXiv e-print repository

Fusing uncalibrated IMUs and handheld smartphone video to reconstruct knee kinematics

Authors: J. D. Peiffer, Kunal Shah, Shawana Anarwala, Kayan Abdou, R. James Cotton

Abstract: Video and wearable sensor data provide complementary information about human movement. Video provides a holistic understanding of the entire body in the world while wearable sensors provide high-resolution measurements of specific body segments. A robust method to fuse these modalities and obtain biomechanically accurate kinematics would have substantial utility for clinical assessment and monitor… ▽ More Video and wearable sensor data provide complementary information about human movement. Video provides a holistic understanding of the entire body in the world while wearable sensors provide high-resolution measurements of specific body segments. A robust method to fuse these modalities and obtain biomechanically accurate kinematics would have substantial utility for clinical assessment and monitoring. While multiple video-sensor fusion methods exist, most assume that a time-intensive, and often brittle, sensor-body calibration process has already been performed. In this work, we present a method to combine handheld smartphone video and uncalibrated wearable sensor data at their full temporal resolution. Our monocular, video-only, biomechanical reconstruction already performs well, with only several degrees of error at the knee during walking compared to markerless motion capture. Reconstructing from a fusion of video and wearable sensor data further reduces this error. We validate this in a mixture of people with no gait impairments, lower limb prosthesis users, and individuals with a history of stroke. We also show that sensor data allows tracking through periods of visual occlusion. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted to International Conference on Biomedical Robotics and Biomechatronics 2024

arXiv:2405.07518 [pdf, other]

SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Experts (CoE) is an alternative modular approach that lowers the cost and complexity of training and serving. However, this approach presents two key challenges when using conventional hardware: (1) without fused operations, smaller models have lower operational intensity, which makes high utilization more challenging to achieve; and (2) hosting a large number of models can be either prohibitively expensive or slow when dynamically switching between them. In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall. We describe Samba-CoE, a CoE system with 150 experts and a trillion total parameters. We deploy Samba-CoE on the SambaNova SN40L Reconfigurable Dataflow Unit (RDU) - a commercial dataflow accelerator architecture that has been co-designed for enterprise inference and training applications. The chip introduces a new three-tier memory system with on-chip distributed SRAM, on-package HBM, and off-package DDR DRAM. A dedicated inter-RDU network enables scaling up and out over multiple sockets. We demonstrate speedups ranging from 2x to 13x on various benchmarks running on eight RDU sockets compared with an unfused baseline. We show that for CoE inference deployments, the 8-socket RDU Node reduces machine footprint by up to 19x, speeds up model switching time by 15x to 31x, and achieves an overall speedup of 3.7x over a DGX H100 and 6.6x over a DGX A100. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2404.18893 [pdf, other]

Learning general Gaussian mixtures with efficient score matching

Authors: Sitan Chen, Vasilis Kontonis, Kulin Shah

Abstract: We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture, runs in… ▽ More We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is $\varepsilon$-far from the unknown mixture in total variation. Prior works for this problem either (i) required exponential runtime in the dimension $d$, (ii) placed strong assumptions on the instance (e.g., spherical covariances or clusterability), or (iii) had doubly exponential dependence on the number of components $k$. Our approach departs from commonly used techniques for this problem like the method of moments. Instead, we leverage a recently developed reduction, based on diffusion models, from distribution learning to a supervised learning task called score matching. We give an algorithm for the latter by proving a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function, and there is an efficient algorithm for finding it. To our knowledge, this is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 57 pages

arXiv:2403.06107 [pdf, other]

Textureless Object Recognition: An Edge-based Approach

Authors: Frincy Clement, Kirtan Shah, Dhara Pancholi, Gabriel Lugo Bustillo, Dr. Irene Cheng

Abstract: Textureless object recognition has become a significant task in Computer Vision with the advent of Robotics and its applications in manufacturing sector. It has been challenging to obtain good accuracy in real time because of its lack of discriminative features and reflectance properties which makes the techniques for textured object recognition insufficient for textureless objects. A lot of work… ▽ More Textureless object recognition has become a significant task in Computer Vision with the advent of Robotics and its applications in manufacturing sector. It has been challenging to obtain good accuracy in real time because of its lack of discriminative features and reflectance properties which makes the techniques for textured object recognition insufficient for textureless objects. A lot of work has been done in the last 20 years, especially in the recent 5 years after the TLess and other textureless dataset were introduced. In this project, by applying image processing techniques we created a robust augmented dataset from initial imbalanced smaller dataset. We extracted edge features, feature combinations and RGB images enhanced with feature/feature combinations to create 15 datasets, each with a size of ~340,000. We then trained four classifiers on these 15 datasets to arrive at a conclusion as to which dataset performs the best overall and whether edge features are important for textureless objects. Based on our experiments and analysis, RGB images enhanced with combination of 3 edge features performed the best compared to all others. Model performance on dataset with HED edges performed comparatively better than other edge detectors like Canny or Prewitt. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:1910.14255

arXiv:2403.00961 [pdf, other]

Data Science Education in Undergraduate Physics: Lessons Learned from a Community of Practice

Authors: Karan Shah, Julie Butler, Alexis Knaub, Anıl Zenginoğlu, William Ratcliff, Mohammad Soltanieh-ha

Abstract: It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from differen… ▽ More It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from different institutions and backgrounds to share best practices and lessons learned from integrating data science into undergraduate physics education. In this article we present insights and experiences from this community of practice, highlighting key strategies and challenges in incorporating data science into the introductory physics curriculum. Our goal is to provide guidance and inspiration to educators who seek to integrate data science into their teaching, hel** to prepare the next generation of physicists for a data-driven world. △ Less

Submitted 16 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 21 pages, 4 figures, 2 tables. The associated GItHub repository can be found at https://github.com/GDS-Education-Community-of-Practice/DSECOP

arXiv:2402.14454 [pdf, other]

CCPA: Long-term Person Re-Identification via Contrastive Clothing and Pose Augmentation

Authors: Vuong D. Nguyen, Shishir K. Shah

Abstract: Long-term Person Re-Identification (LRe-ID) aims at matching an individual across cameras after a long period of time, presenting variations in clothing, pose, and viewpoint. In this work, we propose CCPA: Contrastive Clothing and Pose Augmentation framework for LRe-ID. Beyond appearance, CCPA captures body shape information which is cloth-invariant using a Relation Graph Attention Network. Traini… ▽ More Long-term Person Re-Identification (LRe-ID) aims at matching an individual across cameras after a long period of time, presenting variations in clothing, pose, and viewpoint. In this work, we propose CCPA: Contrastive Clothing and Pose Augmentation framework for LRe-ID. Beyond appearance, CCPA captures body shape information which is cloth-invariant using a Relation Graph Attention Network. Training a robust LRe-ID model requires a wide range of clothing variations and expensive cloth labeling, which is lacked in current LRe-ID datasets. To address this, we perform clothing and pose transfer across identities to generate images of more clothing variations and of different persons wearing similar clothing. The augmented batch of images serve as inputs to our proposed Fine-grained Contrastive Losses, which not only supervise the Re-ID model to learn discriminative person embeddings under long-term scenarios but also ensure in-distribution data generation. Results on LRe-ID datasets demonstrate the effectiveness of our CCPA framework. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.09710 [pdf]

Preserving Data Privacy for ML-driven Applications in Open Radio Access Networks

Authors: Pranshav Gajjar, Azuka Chie**a, Vijay K. Shah

Abstract: Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper a… ▽ More Deep learning offers a promising solution to improve spectrum access techniques by utilizing data-driven approaches to manage and share limited spectrum resources for emerging applications. For several of these applications, the sensitive wireless data (such as spectrograms) are stored in a shared database or multistakeholder cloud environment and are therefore prone to privacy leaks. This paper aims to address such privacy concerns by examining the representative case study of shared database scenarios in 5G Open Radio Access Network (O-RAN) networks where we have a shared database within the near-real-time (near-RT) RAN intelligent controller. We focus on securing the data that can be used by machine learning (ML) models for spectrum sharing and interference mitigation applications without compromising the model and network performances. The underlying idea is to leverage a (i) Shuffling-based learnable encryption technique to encrypt the data, following which, (ii) employ a custom Vision transformer (ViT) as the trained ML model that is capable of performing accurate inferences on such encrypted data. The paper offers a thorough analysis and comparisons with analogous convolutional neural networks (CNN) as well as deeper architectures (such as ResNet-50) as baselines. Our experiments showcase that the proposed approach significantly outperforms the baseline CNN with an improvement of 24.5% and 23.9% for the percent accuracy and F1-Score respectively when operated on encrypted data. Though deeper ResNet-50 architecture is obtained as a slightly more accurate model, with an increase of 4.4%, the proposed approach boasts a reduction of parameters by 99.32%, and thus, offers a much-improved prediction time by nearly 60%. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.07258 [pdf, other]

Data Quality Aware Approaches for Addressing Model Drift of Semantic Segmentation Models

Authors: Samiha Mirza, Vuong D. Nguyen, Pranav Mantini, Shishir K. Shah

Abstract: In the midst of the rapid integration of artificial intelligence (AI) into real world applications, one pressing challenge we confront is the phenomenon of model drift, wherein the performance of AI models gradually degrades over time, compromising their effectiveness in real-world, dynamic environments. Once identified, we need techniques for handling this drift to preserve the model performance… ▽ More In the midst of the rapid integration of artificial intelligence (AI) into real world applications, one pressing challenge we confront is the phenomenon of model drift, wherein the performance of AI models gradually degrades over time, compromising their effectiveness in real-world, dynamic environments. Once identified, we need techniques for handling this drift to preserve the model performance and prevent further degradation. This study investigates two prominent quality aware strategies to combat model drift: data quality assessment and data conditioning based on prior model knowledge. The former leverages image quality assessment metrics to meticulously select high-quality training data, improving the model robustness, while the latter makes use of learned feature vectors from existing models to guide the selection of future data, aligning it with the model's prior knowledge. Through comprehensive experimentation, this research aims to shed light on the efficacy of these approaches in enhancing the performance and reliability of semantic segmentation models, thereby contributing to the advancement of computer vision capabilities in real-world scenarios. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.06846 [pdf, other]

System-level Analysis of Adversarial Attacks and Defenses on Intelligence in O-RAN based Cellular Networks

Authors: Azuka Chie**a, Brian Kim, Kaushik Chowhdury, Vijay K. Shah

Abstract: While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intellige… ▽ More While the open architecture, open interfaces, and integration of intelligence within Open Radio Access Network technology hold the promise of transforming 5G and 6G networks, they also introduce cybersecurity vulnerabilities that hinder its widespread adoption. In this paper, we conduct a thorough system-level investigation of cyber threats, with a specific focus on machine learning (ML) intelligence components known as xApps within the O-RAN's near-real-time RAN Intelligent Controller (near-RT RIC) platform. Our study begins by develo** a malicious xApp designed to execute adversarial attacks on two types of test data - spectrograms and key performance metrics (KPMs), stored in the RIC database within the near-RT RIC. To mitigate these threats, we utilize a distillation technique that involves training a teacher model at a high softmax temperature and transferring its knowledge to a student model trained at a lower softmax temperature, which is deployed as the robust ML model within xApp. We prototype an over-the-air LTE/5G O-RAN testbed to assess the impact of these attacks and the effectiveness of the distillation defense technique by leveraging an ML-based Interference Classification (InterClass) xApp as an example. We examine two versions of InterClass xApp under distinct scenarios, one based on Convolutional Neural Networks (CNNs) and another based on Deep Neural Networks (DNNs) using spectrograms and KPMs as input data respectively. Our findings reveal up to 100% and 96.3% degradation in the accuracy of both the CNN and DNN models respectively resulting in a significant decline in network performance under considered adversarial attacks. Under the strict latency constraints of the near-RT RIC closed control loop, our analysis shows that the distillation technique outperforms classical adversarial training by achieving an accuracy of up to 98.3% for mitigating such attacks. △ Less

Submitted 13 February, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: his paper has been accepted for publication in ACM WiSec 2024

arXiv:2402.04447 [pdf, other]

Context-Aware Spectrum Coexistence of Terrestrial Beyond 5G Networks in Satellite Bands

Authors: Ta Seen Reaz Niloy, Zoheb Hasan, Rob Smith, Vikram R. Anapana, Vijay K. Shah

Abstract: Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art s… ▽ More Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art spectrum-sharing policies usually consider several worst-case assumptions and ignore site-specific contextual factors in making spectrum-sharing decisions, and thus, often results in under-utilization of the shared band for the secondary licensees. To address such limitations, this paper introduces CAT3S (Context-Aware Terrestrial-Satellite Spectrum Sharing) framework that empowers the coexisting terrestrial 5G network to maximize utilization of the shared satellite band without creating harmful interference to the incumbent links by exploiting the contextual factors. CAT3S consists of the following two components: (i) context-acquisition unit to collect and process essential contextual information for spectrum sharing and (ii) context-aware base station (BS) control unit to optimize the set of operational BSs and their operation parameters (i.e., transmit power and active beams per sector). To evaluate the performance of the CAT3S, a realistic spectrum coexistence case study over the 12 GHz band is considered. Experiment results demonstrate that the proposed CAT3S achieves notably higher spectrum utilization than state-of-the-art spectrum-sharing policies in different weather contexts. △ Less

Submitted 14 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03716 [pdf, other]

Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification

Authors: Vuong D. Nguyen, Samiha Mirza, Pranav Mantini, Shishir K. Shah

Abstract: Current state-of-the-art Video-based Person Re-Identification (Re-ID) primarily relies on appearance features extracted by deep learning models. These methods are not applicable for long-term analysis in real-world scenarios where persons have changed clothes, making appearance information unreliable. In this work, we deal with the practical problem of Video-based Cloth-Changing Person Re-ID (VCCR… ▽ More Current state-of-the-art Video-based Person Re-Identification (Re-ID) primarily relies on appearance features extracted by deep learning models. These methods are not applicable for long-term analysis in real-world scenarios where persons have changed clothes, making appearance information unreliable. In this work, we deal with the practical problem of Video-based Cloth-Changing Person Re-ID (VCCRe-ID) by proposing "Attention-based Shape and Gait Representations Learning" (ASGL) for VCCRe-ID. Our ASGL framework improves Re-ID performance under clothing variations by learning clothing-invariant gait cues using a Spatial-Temporal Graph Attention Network (ST-GAT). Given the 3D-skeleton-based spatial-temporal graph, our proposed ST-GAT comprises multi-head attention modules, which are able to enhance the robustness of gait embeddings under viewpoint changes and occlusions. The ST-GAT amplifies the important motion ranges and reduces the influence of noisy poses. Then, the multi-head learning module effectively reserves beneficial local temporal dynamics of movement. We also boost discriminative power of person representations by learning body shape cues using a GAT. Experiments on two large-scale VCCRe-ID datasets demonstrate that our proposed framework outperforms state-of-the-art methods by 12.2% in rank-1 accuracy and 7.0% in mAP. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.15900 [pdf, other]

MV2MAE: Multi-View Video Masked Autoencoders

Authors: Ketul Shah, Robert Crandall, Jie Xu, Peng Zhou, Marian George, Mayank Bansal, Rama Chellappa

Abstract: Videos captured from multiple viewpoints can help in perceiving the 3D structure of the world and benefit computer vision tasks such as action recognition, tracking, etc. In this paper, we present a method for self-supervised learning from synchronized multi-view videos. We use a cross-view reconstruction task to inject geometry information in the model. Our approach is based on the masked autoenc… ▽ More Videos captured from multiple viewpoints can help in perceiving the 3D structure of the world and benefit computer vision tasks such as action recognition, tracking, etc. In this paper, we present a method for self-supervised learning from synchronized multi-view videos. We use a cross-view reconstruction task to inject geometry information in the model. Our approach is based on the masked autoencoder (MAE) framework. In addition to the same-view decoder, we introduce a separate cross-view decoder which leverages cross-attention mechanism to reconstruct a target viewpoint video using a video from source viewpoint, to help representations robust to viewpoint changes. For videos, static regions can be reconstructed trivially which hinders learning meaningful representations. To tackle this, we introduce a motion-weighted reconstruction loss which improves temporal modeling. We report state-of-the-art results on the NTU-60, NTU-120 and ETRI datasets, as well as in the transfer learning setting on NUCLA, PKU-MMD-II and ROCOG-v2 datasets, demonstrating the robustness of our approach. Code will be made available. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.02914 [pdf, other]

Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

Authors: Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa

Abstract: In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then… ▽ More In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results. △ Less

Submitted 20 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Accepted at CVPR 2024. 13 pages, 4 figures. Approved for public release: distribution unlimited

arXiv:2311.12161 [pdf, other]

ChemScraper: Leveraging PDF Graphics Instructions for Molecular Diagram Parsing

Authors: Ayush Kumar Shah, Bryan Manrique Amador, Abhisek Dey, Ming Creekmore, Blake Ocampo, Scott Denmark, Richard Zanibbi

Abstract: Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character Recognition (OCR), or vectorization.… ▽ More Most molecular diagram parsers recover chemical structure from raster images (e.g., PNGs). However, many PDFs include commands giving explicit locations and shapes for characters, lines, and polygons. We present a new parser that uses these born-digital PDF primitives as input. The parsing model is fast and accurate, and does not require GPUs, Optical Character Recognition (OCR), or vectorization. We use the parser to annotate raster images and then train a new multi-task neural network for recognizing molecules in raster images. We evaluate our parsers using SMILES and standard benchmarks, along with a novel evaluation protocol comparing molecular graphs directly that supports automatic error compilation and reveals errors missed by SMILES-based evaluation. On the synthetic USPTO benchmark, our born-digital parser obtains a recognition rate of 98.4% (1% higher than previous models) and our relatively simple neural parser for raster images obtains a rate of 85% using less training data than existing neural approaches (thousands vs. millions of molecules). △ Less

Submitted 31 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: 20 pages without references, 12 figures, 4 Tables, submitted to International Conference on Document Analysis and Recognition (ICDAR) - Journal Track

arXiv:2311.09753 [pdf, other]

DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics

Authors: Aniket Roy, Maiterya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa

Abstract: Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, efficiently improving generated image quality is still of paramount interest. In this context, we propose a generic "naturalness" preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate t… ▽ More Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, efficiently improving generated image quality is still of paramount interest. In this context, we propose a generic "naturalness" preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate the image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass versions of the image. To retain the "naturalness" of the generated images, we enforce reducing the gap between the highest and lowest kurtosis values across the band-pass versions (e.g., Discrete Wavelet Transform (DWT)) of images. Note that our approach does not require any additional guidance like classifier or classifier-free guidance to improve the image quality. We validate the proposed approach for three diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, and (3) image super-resolution. Integrating the proposed KC loss has improved the perceptual quality across all these tasks in terms of both FID, MUSIQ score, and user evaluation. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.06430 [pdf, other]

GOAT: GO to Any Thing

Authors: Matthew Chang, Theophile Gervet, Mukul Khanna, Sriram Yenamandra, Dhruv Shah, So Yeon Min, Kavit Shah, Chris Paxton, Saurabh Gupta, Dhruv Batra, Roozbeh Mottaghi, Jitendra Malik, Devendra Singh Chaplot

Abstract: In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals spec… ▽ More In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We present GO To Any Thing (GOAT), a universal navigation system capable of tackling these requirements with three key features: a) Multimodal: it can tackle goals specified via category labels, target images, and language descriptions, b) Lifelong: it benefits from its past experience in the same environment, and c) Platform Agnostic: it can be quickly deployed on robots with different embodiments. GOAT is made possible through a modular system design and a continually augmented instance-aware semantic memory that keeps track of the appearance of objects from different viewpoints in addition to category-level semantics. This enables GOAT to distinguish between different instances of the same category to enable navigation to targets specified by images and language descriptions. In experimental comparisons spanning over 90 hours in 9 different homes consisting of 675 goals selected across 200+ different object instances, we find GOAT achieves an overall success rate of 83%, surpassing previous methods and ablations by 32% (absolute improvement). GOAT improves with experience in the environment, from a 60% success rate at the first goal to a 90% success after exploration. In addition, we demonstrate that GOAT can readily be applied to downstream tasks such as pick and place and social navigation. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2310.12143 [pdf, other]

Simple Mechanisms for Representing, Indexing and Manipulating Concepts

Authors: Yuanzhi Li, Raghu Meka, Rina Panigrahy, Kulin Shah

Abstract: Deep networks typically learn concepts via classifiers, which involves setting up a model and training it via gradient descent to fit the concept-labeled data. We will argue instead that learning a concept could be done by looking at its moment statistics matrix to generate a concrete representation or signature of that concept. These signatures can be used to discover structure across the set of… ▽ More Deep networks typically learn concepts via classifiers, which involves setting up a model and training it via gradient descent to fit the concept-labeled data. We will argue instead that learning a concept could be done by looking at its moment statistics matrix to generate a concrete representation or signature of that concept. These signatures can be used to discover structure across the set of concepts and could recursively produce higher-level concepts by learning this structure from those signatures. When the concepts are `intersected', signatures of the concepts can be used to find a common theme across a number of related `intersected' concepts. This process could be used to keep a dictionary of concepts so that inputs could correctly identify and be routed to the set of concepts involved in the (latent) generation of the input. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: 19 pages

arXiv:2310.07886 [pdf, other]

A Survey of Feature Types and Their Contributions for Camera Tampering Detection

Authors: Pranav Mantini, Shishir K. Shah

Abstract: Camera tamper detection is the ability to detect unauthorized and unintentional alterations in surveillance cameras by analyzing the video. Camera tampering can occur due to natural events or it can be caused intentionally to disrupt surveillance. We cast tampering detection as a change detection problem, and perform a review of the existing literature with emphasis on feature types. We formulate… ▽ More Camera tamper detection is the ability to detect unauthorized and unintentional alterations in surveillance cameras by analyzing the video. Camera tampering can occur due to natural events or it can be caused intentionally to disrupt surveillance. We cast tampering detection as a change detection problem, and perform a review of the existing literature with emphasis on feature types. We formulate tampering detection as a time series analysis problem, and design experiments to study the robustness and capability of various feature types. We compute ten features on real-world surveillance video and apply time series analysis to ascertain their predictability, and their capability to detect tampering. Finally, we quantify the performance of various time series models using each feature type to detect tampering. △ Less

Submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.03844 [pdf, other]

Experimental Study of Adversarial Attacks on ML-based xApps in O-RAN

Authors: Naveen Naik Sapavath, Brian Kim, Kaushik Chowdhury, Vijay K Shah

Abstract: Open Radio Access Network (O-RAN) is considered as a major step in the evolution of next-generation cellular networks given its support for open interfaces and utilization of artificial intelligence (AI) into the deployment, operation, and maintenance of RAN. However, due to the openness of the O-RAN architecture, such AI models are inherently vulnerable to various adversarial machine learning (ML… ▽ More Open Radio Access Network (O-RAN) is considered as a major step in the evolution of next-generation cellular networks given its support for open interfaces and utilization of artificial intelligence (AI) into the deployment, operation, and maintenance of RAN. However, due to the openness of the O-RAN architecture, such AI models are inherently vulnerable to various adversarial machine learning (ML) attacks, i.e., adversarial attacks which correspond to slight manipulation of the input to the ML model. In this work, we showcase the vulnerability of an example ML model used in O-RAN, and experimentally deploy it in the near-real time (near-RT) RAN intelligent controller (RIC). Our ML-based interference classifier xApp (extensible application in near-RT RIC) tries to classify the type of interference to mitigate the interference effect on the O-RAN system. We demonstrate the first-ever scenario of how such an xApp can be impacted through an adversarial attack by manipulating the data stored in a shared database inside the near-RT RIC. Through a rigorous performance analysis deployed on a laboratory O-RAN testbed, we evaluate the performance in terms of capacity and the prediction accuracy of the interference classifier xApp using both clean and perturbed data. We show that even small adversarial attacks can significantly decrease the accuracy of ML application in near-RT RIC, which can directly impact the performance of the entire O-RAN deployment. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted for Globecom 2023

arXiv:2307.16321 [pdf, other]

Self-Supervised Learning of Gait-Based Biomarkers

Authors: R. James Cotton, J. D. Peiffer, Kunal Shah, Allison DeLillo, Anthony Cimorelli, Shawana Anarwala, Kayan Abdou, Tasos Karakostas

Abstract: Markerless motion capture (MMC) is revolutionizing gait analysis in clinical settings by making it more accessible, raising the question of how to extract the most clinically meaningful information from gait data. In multiple fields ranging from image processing to natural language processing, self-supervised learning (SSL) from large amounts of unannotated data produces very effective representat… ▽ More Markerless motion capture (MMC) is revolutionizing gait analysis in clinical settings by making it more accessible, raising the question of how to extract the most clinically meaningful information from gait data. In multiple fields ranging from image processing to natural language processing, self-supervised learning (SSL) from large amounts of unannotated data produces very effective representations for downstream tasks. However, there has only been limited use of SSL to learn effective representations of gait and movement, and it has not been applied to gait analysis with MMC. One SSL objective that has not been applied to gait is contrastive learning, which finds representations that place similar samples closer together in the learned space. If the learned similarity metric captures clinically meaningful differences, this could produce a useful representation for many downstream clinical tasks. Contrastive learning can also be combined with causal masking to predict future timesteps, which is an appealing SSL objective given the dynamical nature of gait. We applied these techniques to gait analyses performed with MMC in a rehabilitation hospital from a diverse clinical population. We find that contrastive learning on unannotated gait data learns a representation that captures clinically meaningful information. We probe this learned representation using the framework of biomarkers and show it holds promise as both a diagnostic and response biomarker, by showing it can accurately classify diagnosis from gait and is responsive to inpatient therapy, respectively. We ultimately hope these learned representations will enable predictive and prognostic gait-based biomarkers that can facilitate precision rehabilitation through greater use of MMC to quantify movement in rehabilitation. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: Accepted to Ambient Inteligence for Healthcare workshop at MICCAI 2023

Journal ref: Ambient Inteligence for Healthcare workshop at MICCAI 2023

arXiv:2307.12473 [pdf, other]

Adaptive RRI Selection Algorithms for Improved Cooperative Awareness in Decentralized NR-V2X

Authors: Avik Dayal, Vijay K. Shah, Harpreet S. Dhillon, Jeffrey H. Reed

Abstract: Decentralized vehicle-to-everything (V2X) networks (i.e., C-V2X Mode-4 and NR-V2X Mode-2) utilize sensing-based semi-persistent scheduling (SPS) where vehicles sense and reserve suitable radio resources for Basic Safety Message (BSM) transmissions at prespecified periodic intervals termed as Resource Reservation Interval (RRI). Vehicles rely on these received periodic BSMs to localize nearby (tran… ▽ More Decentralized vehicle-to-everything (V2X) networks (i.e., C-V2X Mode-4 and NR-V2X Mode-2) utilize sensing-based semi-persistent scheduling (SPS) where vehicles sense and reserve suitable radio resources for Basic Safety Message (BSM) transmissions at prespecified periodic intervals termed as Resource Reservation Interval (RRI). Vehicles rely on these received periodic BSMs to localize nearby (transmitting) vehicles and infrastructure, referred to as cooperative awareness. Cooperative awareness enables line of sight and non-line of sight localization, extending a vehicle's sensing and perception range. In this work, we first show that under high vehicle density scenarios, existing SPS (with prespecified RRIs) suffer from poor cooperative awareness, quantified as tracking error. Decentralized vehicle-to-everything (V2X) networks (i.e., C-V2X Mode-4 and NR-V2X Mode-2) utilize sensing-based semi-persistent scheduling (SPS) where vehicles sense and reserve suitable radio resources for Basic Safety Message (BSM) transmissions at prespecified periodic intervals termed as Resource Reservation Interval (RRI). Vehicles rely on these received periodic BSMs to localize nearby (transmitting) vehicles and infrastructure, referred to as cooperative awareness. Cooperative awareness enables line of sight and non-line of sight localization, extending a vehicle's sensing and perception range. In this work, we first show that under high vehicle density scenarios, existing SPS (with prespecified RRIs) suffer from poor cooperative awareness, quantified as tracking error. △ Less

Submitted 23 July, 2023; originally announced July 2023.

arXiv:2307.11317 [pdf, other]

XLDA: Linear Discriminant Analysis for Scaling Continual Learning to Extreme Classification at the Edge

Authors: Karan Shah, Vishruth Veerendranath, Anushka Hebbar, Raghavendra Bhat

Abstract: Streaming Linear Discriminant Analysis (LDA) while proven in Class-incremental Learning deployments at the edge with limited classes (upto 1000), has not been proven for deployment in extreme classification scenarios. In this paper, we present: (a) XLDA, a framework for Class-IL in edge deployment where LDA classifier is proven to be equivalent to FC layer including in extreme classification scena… ▽ More Streaming Linear Discriminant Analysis (LDA) while proven in Class-incremental Learning deployments at the edge with limited classes (upto 1000), has not been proven for deployment in extreme classification scenarios. In this paper, we present: (a) XLDA, a framework for Class-IL in edge deployment where LDA classifier is proven to be equivalent to FC layer including in extreme classification scenarios, and (b) optimizations to enable XLDA-based training and inference for edge deployment where there is a constraint on available compute resources. We show up to 42x speed up using a batched training approach and up to 5x inference speedup with nearest neighbor search on extreme datasets like AliProducts (50k classes) and Google Landmarks V2 (81k classes) △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Submitted at ICML 2023: PAC-Bayes Interactive Learning Workshop

arXiv:2307.01178 [pdf, ps, other]

Learning Mixtures of Gaussians Using the DDPM Objective

Authors: Kulin Shah, Sitan Chen, Adam Klivans

Abstract: Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fun… ▽ More Recent works have shown that diffusion models can learn essentially any distribution provided one can perform score estimation. Yet it remains poorly understood under what settings score estimation is possible, let alone when practical gradient-based algorithms for this task can provably succeed. In this work, we give the first provably efficient results along these lines for one of the most fundamental distribution families, Gaussian mixture models. We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model in the following two settings: 1) We show gradient descent with random initialization learns mixtures of two spherical Gaussians in $d$ dimensions with $1/\text{poly}(d)$-separated centers. 2) We show gradient descent with a warm start learns mixtures of $K$ spherical Gaussians with $Ω(\sqrt{\log(\min(K,d))})$-separated centers. A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning, the EM algorithm and spectral methods. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 48 pages

arXiv:2306.15242 [pdf, other]

SPDER: Semiperiodic Dam**-Enabled Object Representation

Authors: Kathan Shah, Chawin Sitawarin

Abstract: We present a neural network architecture designed to naturally learn a positional embedding and overcome the spectral bias towards lower frequencies faced by conventional implicit neural representation networks. Our proposed architecture, SPDER, is a simple MLP that uses an activation function composed of a sinusoidal multiplied by a sublinear function, called the dam** function. The sinusoidal… ▽ More We present a neural network architecture designed to naturally learn a positional embedding and overcome the spectral bias towards lower frequencies faced by conventional implicit neural representation networks. Our proposed architecture, SPDER, is a simple MLP that uses an activation function composed of a sinusoidal multiplied by a sublinear function, called the dam** function. The sinusoidal enables the network to automatically learn the positional embedding of an input coordinate while the dam** passes on the actual coordinate value by preventing it from being projected down to within a finite range of values. Our results indicate that SPDERs speed up training by 10x and converge to losses 1,500-50,000x lower than that of the state-of-the-art for image representation. SPDER is also state-of-the-art in audio representation. The superior representation capability allows SPDER to also excel on multiple downstream tasks such as image super-resolution and video frame interpolation. We provide intuition as to why SPDER significantly improves fitting compared to that of other INR methods while requiring no hyperparameter tuning or preprocessing. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.05376 [pdf, other]

Anomaly Detection in Satellite Videos using Diffusion Models

Authors: Akash Awasthi, Son Ly, Jaer Nizam, Samira Zare, Videet Mehta, Safwan Ahmed, Keshav Shah, Ramakrishna Nemani, Saurabh Prasad, Hien Van Nguyen

Abstract: The definition of anomaly detection is the identification of an unexpected event. Real-time detection of extreme events such as wildfires, cyclones, or floods using satellite data has become crucial for disaster management. Although several earth-observing satellites provide information about disasters, satellites in the geostationary orbit provide data at intervals as frequent as every minute, ef… ▽ More The definition of anomaly detection is the identification of an unexpected event. Real-time detection of extreme events such as wildfires, cyclones, or floods using satellite data has become crucial for disaster management. Although several earth-observing satellites provide information about disasters, satellites in the geostationary orbit provide data at intervals as frequent as every minute, effectively creating a video from space. There are many techniques that have been proposed to identify anomalies in surveillance videos; however, the available datasets do not have dynamic behavior, so we discuss an anomaly framework that can work on very high-frequency datasets to find very fast-moving anomalies. In this work, we present a diffusion model which does not need any motion component to capture the fast-moving anomalies and outperforms the other baseline methods. △ Less

Submitted 25 May, 2023; originally announced June 2023.

arXiv:2305.19256 [pdf, other]

Ambient Diffusion: Learning Clean Distributions from Corrupted Data

Authors: Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alexandros G. Dimakis, Adam Klivans

Abstract: We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. This problem arises in scientific applications where access to uncorrupted samples is impossible or expensive to acquire. Another benefit of our approach is the ability to train generative models that are less likely to memorize individual training samples since they never obs… ▽ More We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. This problem arises in scientific applications where access to uncorrupted samples is impossible or expensive to acquire. Another benefit of our approach is the ability to train generative models that are less likely to memorize individual training samples since they never observe clean training data. Our main idea is to introduce additional measurement distortion during the diffusion process and require the model to predict the original corrupted image from the further corrupted image. We prove that our method leads to models that learn the conditional expectation of the full uncorrupted image given this additional measurement corruption. This holds for any corruption process that satisfies some technical conditions (and in particular includes inpainting and compressed sensing). We train models on standard benchmarks (CelebA, CIFAR-10 and AFHQ) and show that we can learn the distribution even when all the training samples have $90\%$ of their pixels missing. We also show that we can finetune foundation models on small corrupted datasets (e.g. MRI scans with block corruptions) and learn the clean distribution without memorizing the training set. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 24 pages, 11 figures

arXiv:2305.14410 [pdf, other]

Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach

Authors: Harman Singh, Poorva Garg, Mohit Gupta, Kevin Shah, Ashish Goswami, Satyam Modi, Arnab Kumar Mondal, Dinesh Khandelwal, Dinesh Garg, Parag Singla

Abstract: We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can p… ▽ More We are interested in image manipulation via natural language text -- a task that is useful for multiple AI applications but requires complex reasoning over multi-modal spaces. We extend recently proposed Neuro Symbolic Concept Learning (NSCL), which has been quite effective for the task of Visual Question Answering (VQA), for the task of image manipulation. Our system referred to as NeuroSIM can perform complex multi-hop reasoning over multi-object scenes and only requires weak supervision in the form of annotated data for VQA. NeuroSIM parses an instruction into a symbolic program, based on a Domain Specific Language (DSL) comprising of object attributes and manipulation operations, that guides its execution. We create a new dataset for the task, and extensive experiments demonstrate that NeuroSIM is highly competitive with or beats SOTA baselines that make use of supervised data for manipulation. △ Less

Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: EMNLP 2023 (long paper, main conference)

arXiv:2304.00387 [pdf, other]

HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions

Authors: Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, Rama Chellappa

Abstract: Supervised learning of skeleton sequence encoders for action recognition has received significant attention in recent times. However, learning such encoders without labels continues to be a challenging problem. While prior works have shown promising results by applying contrastive learning to pose sequences, the quality of the learned representations is often observed to be closely tied to data au… ▽ More Supervised learning of skeleton sequence encoders for action recognition has received significant attention in recent times. However, learning such encoders without labels continues to be a challenging problem. While prior works have shown promising results by applying contrastive learning to pose sequences, the quality of the learned representations is often observed to be closely tied to data augmentations that are used to craft the positives. However, augmenting pose sequences is a difficult task as the geometric constraints among the skeleton joints need to be enforced to make the augmentations realistic for that action. In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP - to Hallucinate Latent Positives for contrastive learning. Specifically, HaLP explores the latent space of poses in suitable directions to generate new positives. To this end, we present a novel optimization formulation to solve for the synthetic positives with an explicit control on their hardness. We propose approximations to the objective, making them solvable in closed form with minimal overhead. We show via experiments that using these generated positives within a standard contrastive learning framework leads to consistent improvements across benchmarks such as NTU-60, NTU-120, and PKU-II on tasks like linear evaluation, transfer learning, and kNN evaluation. Our code will be made available at https://github.com/anshulbshah/HaLP. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: To be presented at CVPR 2023

arXiv:2303.10654 [pdf, other]

Markerless Motion Capture and Biomechanical Analysis Pipeline

Authors: R. James Cotton, Allison DeLillo, Anthony Cimorelli, Kunal Shah, J. D. Peiffer, Shawana Anarwala, Kayan Abdou, Tasos Karakostas

Abstract: Markerless motion capture using computer vision and human pose estimation (HPE) has the potential to expand access to precise movement analysis. This could greatly benefit rehabilitation by enabling more accurate tracking of outcomes and providing more sensitive tools for research. There are numerous steps between obtaining videos to extracting accurate biomechanical results and limited research t… ▽ More Markerless motion capture using computer vision and human pose estimation (HPE) has the potential to expand access to precise movement analysis. This could greatly benefit rehabilitation by enabling more accurate tracking of outcomes and providing more sensitive tools for research. There are numerous steps between obtaining videos to extracting accurate biomechanical results and limited research to guide many critical design decisions in these pipelines. In this work, we analyze several of these steps including the algorithm used to detect keypoints and the keypoint set, the approach to reconstructing trajectories for biomechanical inverse kinematics and optimizing the IK process. Several features we find important are: 1) using a recent algorithm trained on many datasets that produces a dense set of biomechanically-motivated keypoints, 2) using an implicit representation to reconstruct smooth, anatomically constrained marker trajectories for IK, 3) iteratively optimizing the biomechanical model to match the dense markers, 4) appropriate regularization of the IK process. Our pipeline makes it easy to obtain accurate biomechanical estimates of movement in a rehabilitation hospital. △ Less

Submitted 19 March, 2023; originally announced March 2023.

arXiv:2303.10280 [pdf, other]

Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

Authors: Arun V. Reddy, Ketul Shah, William Paul, Rohita Mocharla, Judy Hoffman, Kapil D. Katyal, Dinesh Manocha, Celso M. de Melo, Rama Chellappa

Abstract: Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data h… ▽ More Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data has shown promise as a way to avoid the substantial costs and potential ethical concerns associated with collecting and labeling enormous amounts of data in the real-world. However, synthetic data may differ from real data in important ways. This phenomenon, known as \textit{domain shift}, can limit the utility of synthetic data in robotics applications. To mitigate the effects of domain shift, substantial effort is being dedicated to the development of domain adaptation (DA) techniques. Yet, much remains to be understood about how best to develop these techniques. In this paper, we introduce a new dataset called Robot Control Gestures (RoCoG-v2). The dataset is composed of both real and synthetic videos from seven gesture classes, and is intended to support the study of synthetic-to-real domain shift for video-based action recognition. Our work expands upon existing datasets by focusing the action classes on gestures for human-robot teaming, as well as by enabling investigation of domain shift in both ground and aerial views. We present baseline results using state-of-the-art action recognition and domain adaptation algorithms and offer initial insight on tackling the synthetic-to-real and ground-to-air domain shifts. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: ICRA 2023. The first two authors contributed equally. Dataset available at: https://github.com/reddyav1/RoCoG-v2

arXiv:2303.03326 [pdf, other]

doi 10.1109/INFOCOMWKSHPS57453.2023.10226045

Keep It Simple: CNN Model Complexity Studies for Interference Classification Tasks

Authors: Taiwo Oyedare, Vijay K. Shah, Daniel J. Jakubisin, Jeffrey H. Reed

Abstract: The growing number of devices using the wireless spectrum makes it important to find ways to minimize interference and optimize the use of the spectrum. Deep learning models, such as convolutional neural networks (CNNs), have been widely utilized to identify, classify, or mitigate interference due to their ability to learn from the data directly. However, there have been limited research on the co… ▽ More The growing number of devices using the wireless spectrum makes it important to find ways to minimize interference and optimize the use of the spectrum. Deep learning models, such as convolutional neural networks (CNNs), have been widely utilized to identify, classify, or mitigate interference due to their ability to learn from the data directly. However, there have been limited research on the complexity of such deep learning models. The major focus of deep learning-based wireless classification literature has been on improving classification accuracy, often at the expense of model complexity. This may not be practical for many wireless devices, such as, internet of things (IoT) devices, which usually have very limited computational resources and cannot handle very complex models. Thus, it becomes important to account for model complexity when designing deep learning-based models for interference classification. To address this, we conduct an analysis of CNN based wireless classification that explores the trade-off amongst dataset size, CNN model complexity, and classification accuracy under various levels of classification difficulty: namely, interference classification, heterogeneous transmitter classification, and homogeneous transmitter classification. Our study, based on three wireless datasets, shows that a simpler CNN model with fewer parameters can perform just as well as a more complex model, providing important insights into the use of CNNs in computationally constrained applications. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 6 pages, 7 figures, 3 tables

arXiv:2303.02413 [pdf, other]

Improved Trajectory Reconstruction for Markerless Pose Estimation

Authors: R. James Cotton, Anthony Cimorelli, Kunal Shah, Shawana Anarwala, Scott Uhlrich, Tasos Karakostas

Abstract: Markerless pose estimation allows reconstructing human movement from multiple synchronized and calibrated views, and has the potential to make movement analysis easy and quick, including gait analysis. This could enable much more frequent and quantitative characterization of gait impairments, allowing better monitoring of outcomes and responses to interventions. However, the impact of different ke… ▽ More Markerless pose estimation allows reconstructing human movement from multiple synchronized and calibrated views, and has the potential to make movement analysis easy and quick, including gait analysis. This could enable much more frequent and quantitative characterization of gait impairments, allowing better monitoring of outcomes and responses to interventions. However, the impact of different keypoint detectors and reconstruction algorithms on markerless pose estimation accuracy has not been thoroughly evaluated. We tested these algorithmic choices on data acquired from a multicamera system from a heterogeneous sample of 25 individuals seen in a rehabilitation hospital. We found that using a top-down keypoint detector and reconstructing trajectories with an implicit function enabled accurate, smooth and anatomically plausible trajectories, with a noise in the step width estimates compared to a GaitRite walkway of only 8mm. △ Less

Submitted 8 March, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

arXiv:2212.05404 [pdf, other]

Cap2Aug: Caption guided Image to Image data Augmentation

Authors: Aniket Roy, Anshul Shah, Ketul Shah, Anirban Roy, Rama Chellappa

Abstract: Visual recognition in a low-data regime is challenging and often prone to overfitting. To mitigate this issue, several data augmentation strategies have been proposed. However, standard transformations, e.g., rotation, crop**, and flip** provide limited semantic variations. To this end, we propose Cap2Aug, an image-to-image diffusion model-based data augmentation strategy using image captions… ▽ More Visual recognition in a low-data regime is challenging and often prone to overfitting. To mitigate this issue, several data augmentation strategies have been proposed. However, standard transformations, e.g., rotation, crop**, and flip** provide limited semantic variations. To this end, we propose Cap2Aug, an image-to-image diffusion model-based data augmentation strategy using image captions as text prompts. We generate captions from the limited training images and using these captions edit the training images using an image-to-image stable diffusion model to generate semantically meaningful augmentations. This strategy generates augmented versions of images similar to the training images yet provides semantic diversity across the samples. We show that the variations within the class can be captured by the captions and then translated to generate diverse samples using the image-to-image diffusion model guided by the captions. However, naive learning on synthetic images is not adequate due to the domain gap between real and synthetic images. Thus, we employ a maximum mean discrepancy (MMD) loss to align the synthetic images to the real images for minimizing the domain gap. We evaluate our method on few-shot and long-tail classification tasks and obtain performance improvements over state-of-the-art, especially in the low-data regimes. △ Less

Submitted 6 November, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

arXiv:2212.04563 [pdf, other]

An SLR on Edge Computing Security and possible threat protection

Authors: Harsiddh Kalariya, Kavish Shah, Vini Patel

Abstract: Mobile and Internet of Things devices are generating enormous amounts of multi-modal data due to their exponential growth and accessibility. As a result, these data sources must be directly analyzed in real time at the network edge rather than relying on the cloud. Significant processing power at the network's edge has made it possible to gather data and make decisions prior to data being sent to… ▽ More Mobile and Internet of Things devices are generating enormous amounts of multi-modal data due to their exponential growth and accessibility. As a result, these data sources must be directly analyzed in real time at the network edge rather than relying on the cloud. Significant processing power at the network's edge has made it possible to gather data and make decisions prior to data being sent to the cloud. Moreover, security problems have significantly towered as a result of the rapid expansion of mobile devices, Internet of Things (IoT) devices, and various network points. It's much harder than ever to guarantee the privacy of sensitive data, including customer information. This systematic literature review depicts the fact that new technologies are a great weapon to fight with the attack and threats to the edge computing security. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2205.13178 [pdf, other]

Prototy** Next-Generation O-RAN Research Testbeds with SDRs

Authors: Pratheek S. Upadhyaya, Aly S. Abdalla, Vuk Marojevic, Jeffrey H. Reed, Vijay K. Shah

Abstract: Open RAN (O-RAN) defines an emerging cellular radio access network (RAN) architecture for future 6G wireless networks, emphasizing openness and intelligence which are considered the foundations of future 6G wireless networks. While the inherent complexity and flexibility of the RAN give rise to many new research problems, progress in develo** solutions is hampered due to the lack of end-to-end,… ▽ More Open RAN (O-RAN) defines an emerging cellular radio access network (RAN) architecture for future 6G wireless networks, emphasizing openness and intelligence which are considered the foundations of future 6G wireless networks. While the inherent complexity and flexibility of the RAN give rise to many new research problems, progress in develo** solutions is hampered due to the lack of end-to-end, fully developed platforms that can help in pursuing use cases in realistic environments. This has motivated the formation of open-source frameworks available to the wireless community. However, the rapid evolution of dedicated platforms and solutions utilizing various software-based technologies often leaves questions regarding the interoperability and interactions between the components in the framework. This article shows how to build a software-defined radio testbed featuring an open-source 5G system that can interact with the near-real-time (near-RT) RAN intelligent controller (RIC) of the O-RAN architecture through standard interfaces. We focus on the O-RAN E2 interface interactions and outline the procedure to enable a RAN system with E2 capabilities. We demonstrate the working of two xApps on the testbed with detailed E2 message exchange procedures and their role in controlling next-generation RANs. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: This manuscript has been submitted to IEEE Vehicular Technology Magazine for possible publication

arXiv:2203.11258 [pdf, other]

Efficient Classification of Long Documents Using Transformers

Authors: Hyunji Hayley Park, Yogarshi Vyas, Kashif Shah

Abstract: Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a comprehensive evaluation of the relative efficacy measured against various baselines and diverse datasets -- both in terms of accuracy as well as time and space overhead… ▽ More Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a comprehensive evaluation of the relative efficacy measured against various baselines and diverse datasets -- both in terms of accuracy as well as time and space overheads. Our datasets cover binary, multi-class, and multi-label classification tasks and represent various ways information is organized in a long text (e.g. information that is critical to making the classification decision is at the beginning or towards the end of the document). Our results show that more complex models often fail to outperform simple baselines and yield inconsistent performance across datasets. These findings emphasize the need for future studies to consider comprehensive baselines and datasets that better represent the task of long document classification to develop robust models. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: Accepted to ACL 2022; 8 pages

arXiv:2203.04227 [pdf, other]

A Practical AoI Scheduler in IoT Networks with Relays

Authors: Biplav Choudhury, Prasenjit Karmakar, Vijay K. Shah, Jeffrey H. Reed

Abstract: Internet of Things (IoT) networks have become ubiquitous as autonomous computing, communication and collaboration among devices become popular for accomplishing various tasks. The use of relays in IoT networks further makes it convenient to deploy IoT networks as relays provide a host of benefits, like increasing the communication range and minimizing power consumption. Existing literature on trad… ▽ More Internet of Things (IoT) networks have become ubiquitous as autonomous computing, communication and collaboration among devices become popular for accomplishing various tasks. The use of relays in IoT networks further makes it convenient to deploy IoT networks as relays provide a host of benefits, like increasing the communication range and minimizing power consumption. Existing literature on traditional AoI schedulers for such two-hop relayed IoT networks are limited because they are designed assuming constant/non-changing channel conditions and known (usually, generate-at-will) packet generation patterns. Deep reinforcement learning (DRL) algorithms have been investigated for AoI scheduling in two-hop IoT networks with relays, however, they are only applicable for small-scale IoT networks due to exponential rise in action space as the networks become large. These limitations discourage the practical utilization of AoI schedulers for IoT network deployments. This paper presents a practical AoI scheduler for two-hop IoT networks with relays that addresses the above limitations. The proposed scheduler utilizes a novel voting mechanism based proximal policy optimization (v-PPO) algorithm that maintains a linear action space, enabling it be scale well with larger IoT networks. The proposed v-PPO based AoI scheduler adapts well to changing network conditions and accounts for unknown traffic generation patterns, making it practical for real-world IoT deployments. Simulation results show that the proposed v-PPO based AoI scheduler outperforms both ML and traditional (non-ML) AoI schedulers, such as, Deep Q Network (DQN)-based AoI Scheduler, Maximal Age First-Maximal Age Difference (MAF-MAD), MAF (Maximal Age First) , and round-robin in all considered practical scenarios. △ Less

Submitted 25 April, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2107.05181

arXiv:2201.04180 [pdf, other]

doi 10.2514/6.2022-2379

Learning Robust Policies for Generalized Debris Capture with an Automated Tether-Net System

Authors: Chen Zeng, Grant Hecht, Prajit KrisshnaKumar, Raj K. Shah, Souma Chowdhury, Eleonora M. Botta

Abstract: Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose of large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability-based optimization approaches to design control actions however remain challenging and compu… ▽ More Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose of large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability-based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to the chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learned policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability-based optimization run over an individual scenario. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: This paper has been presented at the 2022 AIAA SciTech Forum and Exposition, and accepted for publication in the corresponding AIAA proceedings

MSC Class: 68T05 (Primary) 70E60; 93E35 (Secondary) ACM Class: I.2.9; J.2

arXiv:2112.08988 [pdf, other]

doi 10.1109/ACCESS.2022.3185124

Interference Suppression Using Deep Learning: Current Approaches and Open Challenges

Authors: Taiwo Oyedare, Vijay K Shah, Daniel J Jakubisin, Jeff H Reed

Abstract: In light of the finite nature of the wireless spectrum and the increasing demand for spectrum use arising from recent technological breakthroughs in wireless communication, the problem of interference continues to persist. Despite recent advancements in resolving interference issues, interference still presents a difficult challenge to effective usage of the spectrum. This is partly due to the ris… ▽ More In light of the finite nature of the wireless spectrum and the increasing demand for spectrum use arising from recent technological breakthroughs in wireless communication, the problem of interference continues to persist. Despite recent advancements in resolving interference issues, interference still presents a difficult challenge to effective usage of the spectrum. This is partly due to the rise in the use of license-free and managed shared bands for Wi-Fi, long term evolution (LTE) unlicensed (LTE-U), LTE licensed assisted access (LAA), 5G NR, and other opportunistic spectrum access solutions. As a result of this, the need for efficient spectrum usage schemes that are robust against interference has never been more important. In the past, most solutions to interference have addressed the problem by using avoidance techniques as well as non-AI mitigation approaches (for example, adaptive filters). The key downside to non-AI techniques is the need for domain expertise in the extraction or exploitation of signal features such as cyclostationarity, bandwidth and modulation of the interfering signals. More recently, researchers have successfully explored AI/ML enabled physical (PHY) layer techniques, especially deep learning which reduces or compensates for the interfering signal instead of simply avoiding it. The underlying idea of ML based approaches is to learn the interference or the interference characteristics from the data, thereby sidelining the need for domain expertise in suppressing the interference. In this paper, we review a wide range of techniques that have used deep learning to suppress interference. We provide comparison and guidelines for many different types of deep learning techniques in interference suppression. In addition, we highlight challenges and potential future research directions for the successful adoption of deep learning in interference suppression. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 26 pages, 10 figures, journal article

arXiv:2111.13754 [pdf, other]

Toward Next Generation Open Radio Access Network--What O-RAN Can and Cannot Do!

Authors: Aly S. Abdalla, Pratheek S. Upadhyaya, Vijay K. Shah, Vuk Marojevic

Abstract: The open radio access network (O-RAN) describes an industry-driven open architecture and interfaces for building next generation RANs with artificial intelligence (AI) controllers. We circulated a survey among researchers, developers, and practitioners to gather their perspectives on O-RAN as a framework for 6G wireless research and development (R&D). The majority responded in favor of O-RAN and i… ▽ More The open radio access network (O-RAN) describes an industry-driven open architecture and interfaces for building next generation RANs with artificial intelligence (AI) controllers. We circulated a survey among researchers, developers, and practitioners to gather their perspectives on O-RAN as a framework for 6G wireless research and development (R&D). The majority responded in favor of O-RAN and identified R&D of interest to them. Motivated by these responses, this paper identifies the limitations of the current O-RAN specifications and the technologies for overcoming them. We recognize end-to-end security, deterministic latency, physical layer real-time control, and testing of AI-based RAN control applications as the critical features to enable and discuss R&D opportunities for extending the architectural capabilities of O-RAN as a platform for 6G wireless. △ Less

Submitted 25 March, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: This article has been accepted for publication in the IEEE Network Magazine

arXiv:2111.07089 [pdf, other]

Evaluating Contrastive Learning on Wearable Timeseries for Downstream Clinical Outcomes

Authors: Kevalee Shah, Dimitris Spathis, Chi Ian Tang, Cecilia Mascolo

Abstract: Vast quantities of person-generated health data (wearables) are collected but the process of annotating to feed to machine learning models is impractical. This paper discusses ways in which self-supervised approaches that use contrastive losses, such as SimCLR and BYOL, previously applied to the vision domain, can be applied to high-dimensional health signals for downstream classification tasks of… ▽ More Vast quantities of person-generated health data (wearables) are collected but the process of annotating to feed to machine learning models is impractical. This paper discusses ways in which self-supervised approaches that use contrastive losses, such as SimCLR and BYOL, previously applied to the vision domain, can be applied to high-dimensional health signals for downstream classification tasks of various diseases spanning sleep, heart, and metabolic conditions. To this end, we adapt the data augmentation step and the overall architecture to suit the temporal nature of the data (wearable traces) and evaluate on 5 downstream tasks by comparing other state-of-the-art methods including supervised learning and an adversarial unsupervised representation learning method. We show that SimCLR outperforms the adversarial method and a fully-supervised method in the majority of the downstream evaluation tasks, and that all self-supervised methods outperform the fully-supervised methods. This work provides a comprehensive benchmark for contrastive methods applied to the wearable time-series domain, showing the promise of task-agnostic representations for downstream clinical outcomes. △ Less

Submitted 13 November, 2021; originally announced November 2021.

Comments: Machine Learning for Health (ML4H) - Extended Abstract

arXiv:2111.05457 [pdf, other]

Optimizing Number, Placement, and Backhaul Connectivity of Multi-UAV Networks

Authors: Javad Sabzehali, Vijay K. Shah, Qiang Fan, Biplav Choudhury, Lingjia Liu, Jeffrey H. Reed

Abstract: Multi-Unmanned Aerial Vehicle (UAV) Networks is a promising solution to providing wireless coverage to ground users in challenging rural areas (such as Internet of Things (IoT) devices in farmlands), where the traditional cellular networks are sparse or unavailable. A key challenge in such networks is the 3D placement of all UAV base stations such that the formed Multi-UAV Network (i) utilizes a m… ▽ More Multi-Unmanned Aerial Vehicle (UAV) Networks is a promising solution to providing wireless coverage to ground users in challenging rural areas (such as Internet of Things (IoT) devices in farmlands), where the traditional cellular networks are sparse or unavailable. A key challenge in such networks is the 3D placement of all UAV base stations such that the formed Multi-UAV Network (i) utilizes a minimum number of UAVs while ensuring -- (ii) backhaul connectivity directly (or via other UAVs) to the nearby terrestrial base station, and (iii) wireless coverage to all ground users in the area of operation. This joint Backhaul-and-coverage-aware Drone Deployment (BoaRD) problem is largely unaddressed in the literature, and, thus, is the focus of the paper. We first formulate the BoaRD problem as Integer Linear Programming (ILP). However, the problem is NP-hard, and therefore, we propose a low complexity algorithm with a provable performance guarantee to solve the problem efficiently. Our simulation study shows that the Proposed algorithm performs very close to that of the Optimal algorithm (solved using ILP solver) for smaller scenarios, where the area size and the number of users are relatively small. For larger scenarios, where the area size and the number of users are relatively large, the proposed algorithm greatly outperforms the baseline approaches -- backhaul-aware greedy and random algorithm, respectively by up to 17% and 95% in utilizing fewer UAVs while ensuring 100% ground user coverage and backhaul connectivity for all deployed UAVs across all considered simulation setting. △ Less

Submitted 16 June, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: To appear in IEEE Internet of Things Journal

arXiv:2110.11602 [pdf, other]

An O(1) algorithm for implementing the LFU cache eviction scheme

Authors: Dhruv Matani, Ketan Shah, Anirban Mitra

Abstract: Cache eviction algorithms are used widely in operating systems, databases and other systems that use caches to speed up execution by caching data that is used by the application. There are many policies such as MRU (Most Recently Used), MFU (Most Frequently Used), LRU (Least Recently Used) and LFU (Least Frequently Used) which each have their advantages and drawbacks and are hence used in specific… ▽ More Cache eviction algorithms are used widely in operating systems, databases and other systems that use caches to speed up execution by caching data that is used by the application. There are many policies such as MRU (Most Recently Used), MFU (Most Frequently Used), LRU (Least Recently Used) and LFU (Least Frequently Used) which each have their advantages and drawbacks and are hence used in specific scenarios. By far, the most widely used algorithm is LRU, both for its $O(1)$ speed of operation as well as its close resemblance to the kind of behaviour that is expected by most applications. The LFU algorithm also has behaviour desirable by many real world workloads. However, in many places, the LRU algorithm is is preferred over the LFU algorithm because of its lower run time complexity of $O(1)$ versus $O(\log n)$. We present here an LFU cache eviction algorithm that has a runtime complexity of $O(1)$ for all of its operations, which include insertion, access and deletion(eviction). △ Less

Submitted 22 October, 2021; originally announced October 2021.

arXiv:2110.09308 [pdf, other]

Power Systems Performance under 5G Radio Access Network in a Co-Simulation Environment

Authors: Rahul Iyer, Biplav Choudhury, Vijay K. Shah, Ali Mehrizi-Sani

Abstract: Communication can improve control of important system parameters by allowing different grid components to communicate their states with each other. This information exchange requires a reliable and fast communication infrastructure. 5G communication can be a viable means to achieve this objective. This paper investigates the performance of several smart grid applications under a 5G radio access ne… ▽ More Communication can improve control of important system parameters by allowing different grid components to communicate their states with each other. This information exchange requires a reliable and fast communication infrastructure. 5G communication can be a viable means to achieve this objective. This paper investigates the performance of several smart grid applications under a 5G radio access network. Different scenarios including set point changes and transients are evaluated, and the results indicate that the system maintains stability when a 5Gnetwork is used to communicate system states. △ Less

Submitted 16 August, 2021; originally announced October 2021.

arXiv:2107.10956 [pdf, other]

Reciprocal Multi-Robot Collision Avoidance with Asymmetric State Uncertainty

Authors: Kunal Shah, Guillermo Angeris, Mac Schwager

Abstract: We present a general decentralized formulation for a large class of collision avoidance methods and show that all collision avoidance methods of this form are guaranteed to be collision free. This class includes several existing algorithms in the literature as special cases. We then present a particular instance of this collision avoidance method, CARP (Collision Avoidance by Reciprocal Projection… ▽ More We present a general decentralized formulation for a large class of collision avoidance methods and show that all collision avoidance methods of this form are guaranteed to be collision free. This class includes several existing algorithms in the literature as special cases. We then present a particular instance of this collision avoidance method, CARP (Collision Avoidance by Reciprocal Projections), that is effective even when the estimates of other agents' positions and velocities are noisy. The method's main computational step involves the solution of a small convex optimization problem, which can be quickly solved in practice, even on embedded platforms, making it practical to use on computationally-constrained robots such as quadrotors. This method can be extended to find smooth polynomial trajectories for higher dynamic systems such at quadrotors. We demonstrate this algorithm's performance in simulations and on a team of physical quadrotors. Our method finds optimal projections in a median time of 17.12ms for 285 instances of 100 randomly generated obstacles, and produces safe polynomial trajectories at over 60hz on-board quadrotors. Our paper is accompanied by an open source Julia implementation and ROS package. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:1905.12875

arXiv:2107.05181 [pdf, other]

AoI-minimizing Scheduling in UAV-relayed IoT Networks

Authors: Biplav Choudhury, Vijay K. Shah, Aidin Ferdowsi, Jeffrey H. Reed, Y. Thomas Hou

Abstract: Due to flexibility, autonomy and low operational cost, unmanned aerial vehicles (UAVs), as fixed aerial base stations, are increasingly being used as \textit{relays} to collect time-sensitive information (i.e., status updates) from IoT devices and deliver it to the nearby terrestrial base station (TBS), where the information gets processed. In order to ensure timely delivery of information to the… ▽ More Due to flexibility, autonomy and low operational cost, unmanned aerial vehicles (UAVs), as fixed aerial base stations, are increasingly being used as \textit{relays} to collect time-sensitive information (i.e., status updates) from IoT devices and deliver it to the nearby terrestrial base station (TBS), where the information gets processed. In order to ensure timely delivery of information to the TBS (from all IoT devices), optimal scheduling of time-sensitive information over two hop UAV-relayed IoT networks (i.e., IoT device to the UAV [hop 1], and UAV to the TBS [hop 2]) becomes a critical challenge. To address this, we propose scheduling policies for Age of Information (AoI) minimization in such two-hop UAV-relayed IoT networks. To this end, we present a low-complexity MAF-MAD scheduler, that employs Maximum AoI First (MAF) policy for sampling of IoT devices at UAV (hop 1) and Maximum AoI Difference (MAD) policy for updating sampled packets from UAV to the TBS (hop 2). We show that MAF-MAD is the optimal scheduler under ideal conditions, i.e., error-free channels and generate-at-will traffic generation at IoT devices. On the contrary, for realistic conditions, we propose a Deep-Q-Networks (DQN) based scheduler. Our simulation results show that DQN-based scheduler outperforms MAF-MAD scheduler and three other baseline schedulers, i.e., Maximal AoI First (MAF), Round Robin (RR) and Random, employed at both hops under general conditions when the network is small (with 10's of IoT devices). However, it does not scale well with network size whereas MAF-MAD outperforms all other schedulers under all considered scenarios for larger networks. △ Less

Submitted 24 September, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

arXiv:2107.04140 [pdf, other]

First-Generation Inference Accelerator Deployment at Facebook

Authors: Michael Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, Satish Nadathur, Sam Naghshineh, Avinash Nayak, Jongsoo Park, Chris Petersen, Martin Schatz, Narayanan Sundaram, Bangsheng Tang, Peter Tang, Amy Yang, Jiecao Yu , et al. (90 additional authors not shown)

Abstract: In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in… ▽ More In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design. △ Less

Submitted 4 August, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2107.03090 [pdf, other]

RISAN: Robust Instance Specific Abstention Network

Authors: Bhavya Kalra, Kulin Shah, Naresh Manwani

Abstract: In this paper, we propose deep architectures for learning instance specific abstain (reject option) binary classifiers. The proposed approach uses double sigmoid loss function as described by Kulin Shah and Naresh Manwani in ("Online Active Learning of Reject Option Classifiers", AAAI, 2020), as a performance measure. We show that the double sigmoid loss is classification calibrated. We also show… ▽ More In this paper, we propose deep architectures for learning instance specific abstain (reject option) binary classifiers. The proposed approach uses double sigmoid loss function as described by Kulin Shah and Naresh Manwani in ("Online Active Learning of Reject Option Classifiers", AAAI, 2020), as a performance measure. We show that the double sigmoid loss is classification calibrated. We also show that the excess risk of 0-d-1 loss is upper bounded by the excess risk of double sigmoid loss. We derive the generalization error bounds for the proposed architecture for reject option classifiers. To show the effectiveness of the proposed approach, we experiment with several real world datasets. We observe that the proposed approach not only performs comparable to the state-of-the-art approaches, it is also robust against label noise. We also provide visualizations to observe the important features learned by the network corresponding to the abstaining decision. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2106.14574 [pdf, other]

Quantifying Social Biases in NLP: A Generalization and Empirical Comparison of Extrinsic Fairness Metrics

Authors: Paula Czarnowska, Yogarshi Vyas, Kashif Shah

Abstract: Measuring bias is key for better understanding and addressing unfairness in NLP/ML models. This is often done via fairness metrics which quantify the differences in a model's behaviour across a range of demographic groups. In this work, we shed more light on the differences and similarities between the fairness metrics used in NLP. First, we unify a broad range of existing metrics under three gene… ▽ More Measuring bias is key for better understanding and addressing unfairness in NLP/ML models. This is often done via fairness metrics which quantify the differences in a model's behaviour across a range of demographic groups. In this work, we shed more light on the differences and similarities between the fairness metrics used in NLP. First, we unify a broad range of existing metrics under three generalized fairness metrics, revealing the connections between them. Next, we carry out an extensive empirical comparison of existing metrics and demonstrate that the observed differences in bias measurement can be systematically explained via differences in parameter choices for our generalized metrics. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: Accepted for publication in Transaction of the Association for Computational Linguistics (TACL), 2021. The arXiv version is a pre-MIT Press publication version

arXiv:2106.10535 [pdf, other]

Learning and Generalization in Overparameterized Normalizing Flows

Authors: Kulin Shah, Amit Deshpande, Navin Goyal

Abstract: In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an impo… ▽ More In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not well understood. Normalizing flows (NFs) constitute an important class of models in unsupervised learning for sampling and density estimation. In this paper, we theoretically and empirically analyze these models when the underlying neural network is a one-hidden-layer overparametrized network. Our main contributions are two-fold: (1) On the one hand, we provide theoretical and empirical evidence that for constrained NFs (this class of NFs underlies many NF constructions) with the one-hidden-layer network, overparametrization hurts training. (2) On the other hand, we prove that unconstrained NFs, a recently introduced model, can efficiently learn any reasonable data distribution under minimal assumptions when the underlying network is overparametrized and has one hidden-layer. △ Less

Submitted 23 March, 2022; v1 submitted 19 June, 2021; originally announced June 2021.

Comments: 75 pages, Accepted in AISTATS 2022

Showing 1–50 of 94 results for author: Shah, K