Search | arXiv e-print repository

arXiv:2406.19580 [pdf, other]

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models

Authors: Saeed Rashidi, William Won, Sudarshan Srinivasan, Puneet Gupta, Tushar Krishna

Abstract: Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating h… ▽ More Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of different parallelization strategies. Furthermore, FRED supports in-switch collective communication execution that reduces the network traffic by approximately 2X. Our results show that FRED can improve the average end-to-end training time of ResNet-152, Transformer-17B, GPT-3, and Transformer-1T by 1.76X, 1.87X, 1.34X, and 1.4X, respectively when compared to a baseline waferscale 2D-Mesh fabric. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.03688 [pdf, other]

Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification

Authors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao **, Zhiyong Lu, Ronald M. Summers

Abstract: In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation,… ▽ More In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation, it facilitates the inclusion of lateral view images and images from any desired viewing position. This opens up avenues for research into new and novel multimodal applications involving paired CT, X-ray images from various views, text, and binary labels. We demonstrate the applicability of DRR-RATE alongside existing large-scale chest X-ray resources, notably the CheXpert dataset and CheXnet model. Experiments demonstrate that CheXnet, when trained and tested on the DRR-RATE dataset, achieves sufficient to high AUC scores for the six common pathologies cited in common literature: Atelectasis, Cardiomegaly, Consolidation, Lung Lesion, Lung Opacity, and Pleural Effusion. Additionally, CheXnet trained on the CheXpert dataset can accurately identify several pathologies, even when operating out of distribution. This confirms that the generated DRR images effectively capture the essential pathology features from CT images. The dataset and labels are publicly accessible at https://huggingface.co/datasets/farrell236/DRR-RATE. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03088 [pdf, other]

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

Authors: Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao

Abstract: Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and i… ▽ More Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and its scalability in data parallelism. Exploiting weights and activations sparsity can further enhance memory storage and computation efficiency. However, existing approaches focus on exploiting sparsity in non-dataflow accelerators, which cannot be applied onto dataflow accelerators because of the large hardware design space introduced. As such, this could miss opportunities to find an optimal combination of sparsity features and hardware designs. In this paper, we propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We propose a Hardware-Aware Sparsity Search (HASS) to systematically determine an efficient sparsity solution for dataflow accelerators. Over a set of models, we achieve an efficiency improvement ranging from 1.3$\times$ to 4.2$\times$ compared to existing sparse designs, which are either non-dataflow or non-hardware-aware. Particularly, the throughput of MobileNetV3 can be optimized to 4895 images per second. HASS is open-source: \url{https://github.com/Yu-Zhewen/HASS} △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: accepted to FPL2024

arXiv:2406.01698 [pdf, other]

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Authors: Abhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna

Abstract: Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the… ▽ More Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet SLOs remains an open research question. In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. Our analysis provides insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications. The source code is available at https://github.com/abhibambhaniya/GenZ-LLM-Analyzer . △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 12 Pages, https://github.com/abhibambhaniya/GenZ-LLM-Analyzer

arXiv:2405.13048 [pdf]

Human-Generative AI Collaborative Problem Solving Who Leads and How Students Perceive the Interactions

Authors: Gaoxia Zhu, Vidya Sudarshan, Jason Fok Kow, Yew Soon Ong

Abstract: This research investigates distinct human-generative AI collaboration types and students' interaction experiences when collaborating with generative AI (i.e., ChatGPT) for problem-solving tasks and how these factors relate to students' sense of agency and perceived collaborative problem solving. By analyzing the surveys and reflections of 79 undergraduate students, we identified three human-genera… ▽ More This research investigates distinct human-generative AI collaboration types and students' interaction experiences when collaborating with generative AI (i.e., ChatGPT) for problem-solving tasks and how these factors relate to students' sense of agency and perceived collaborative problem solving. By analyzing the surveys and reflections of 79 undergraduate students, we identified three human-generative AI collaboration types: even contribution, human leads, and AI leads. Notably, our study shows that 77.21% of students perceived they led or had even contributed to collaborative problem-solving when collaborating with ChatGPT. On the other hand, 15.19% of the human participants indicated that the collaborations were led by ChatGPT, indicating a potential tendency for students to rely on ChatGPT. Furthermore, 67.09% of students perceived their interaction experiences with ChatGPT to be positive or mixed. We also found a positive correlation between positive interaction experience and a sense of positive agency. The results of this study contribute to our understanding of the collaboration between students and generative AI and highlight the need to study further why some students let ChatGPT lead collaborative problem-solving and how to enhance their interaction experience through curriculum and technology design. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: This paper appears at the IEEE Conference on Artificial Intelligence (CAI) 2024

arXiv:2405.11344 [pdf]

Improved Content Understanding With Effective Use of Multi-task Contrastive Learning

Authors: Akanksha Bindal, Sudarshan Ramanujam, Dave Golland, TJ Hazen, Tina Jiang, Fengyu Zhang, Peng Yan

Abstract: In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling… ▽ More In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. We observe positive transfer, leading to superior performance across all tasks when compared to training independently on each. Our model outperforms the baseline on zero shot learning and offers improved multilingual support, highlighting its potential for broader application. The specialized content embeddings produced by our model outperform generalized embeddings offered by OpenAI on Linkedin dataset and tasks. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications. Our work offers insights and best practices for the field to build on. △ Less

Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.11195 [pdf, other]

Trustworthy Actionable Perturbations

Authors: Jesse Friedbaum, Sudarshan Adiga, Ravi Tandon

Abstract: Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier's decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and ``foo… ▽ More Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier's decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and ``fool'' the classifier. We propose a new framework for creating modified inputs that change the true underlying probabilities in a beneficial way which we call Trustworthy Actionable Perturbations (TAP). This includes a novel verification procedure to ensure that TAP change the true class probabilities instead of acting adversarially. Our framework also includes new cost, reward, and goal definitions that are better suited to effectuating change in the real world. We present PAC-learnability results for our verification procedure and theoretically analyze our new method for measuring reward. We also develop a methodology for creating TAP and compare our results to those achieved by previous counterfactual methods. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: Accepted at the 41st International Conference on Machine Learning (ICML) 2024

arXiv:2404.14586 [pdf, other]

Latency-Distortion Tradeoffs in Communicating Classification Results over Noisy Channels

Authors: Noel Teku, Sudarshan Adiga, Ravi Tandon

Abstract: In this work, the problem of communicating decisions of a classifier over a noisy channel is considered. With machine learning based models being used in variety of time-sensitive applications, transmission of these decisions in a reliable and timely manner is of significant importance. To this end, we study the scenario where a probability vector (representing the decisions of a classifier) at th… ▽ More In this work, the problem of communicating decisions of a classifier over a noisy channel is considered. With machine learning based models being used in variety of time-sensitive applications, transmission of these decisions in a reliable and timely manner is of significant importance. To this end, we study the scenario where a probability vector (representing the decisions of a classifier) at the transmitter, needs to be transmitted over a noisy channel. Assuming that the distortion between the original probability vector and the reconstructed one at the receiver is measured via f-divergence, we study the trade-off between transmission latency and the distortion. We completely analyze this trade-off using uniform, lattice, and sparse lattice-based quantization techniques to encode the probability vector by first characterizing bit budgets for each technique given a requirement on the allowed source distortion. These bounds are then combined with results from finite-blocklength literature to provide a framework for analyzing the effects of both quantization distortion and distortion due to decoding error probability (i.e., channel effects) on the incurred transmission latency. Our results show that there is an interesting interplay between source distortion (i.e., distortion for the probability vector measured via f-divergence) and the subsequent channel encoding/decoding parameters; and indicate that a joint design of these parameters is crucial to navigate the latency-distortion tradeoff. We study the impact of changing different parameters (e.g. number of classes, SNR, source distortion) on the latency-distortion tradeoff and perform experiments on AWGN and fading channels. Our results indicate that sparse lattice-based quantization is the most effective at minimizing latency across various regimes and for sparse, high-dimensional probability vectors (i.e., high number of classes). △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE Transactions on Communications

arXiv:2404.13125 [pdf, other]

Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation

Authors: Harshit Kumar, Sudarshan Sharma, Biswadeep Chakraborty, Saibal Mukhopadhyay

Abstract: This study introduces RT-HMD, a Hardware-based Malware Detector (HMD) for mobile devices, that refines malware representation in segmented time-series through a Multiple Instance Learning (MIL) approach. We address the mislabeling issue in real-time HMDs, where benign segments in malware time-series incorrectly inherit malware labels, leading to increased false positives. Utilizing the proposed Ma… ▽ More This study introduces RT-HMD, a Hardware-based Malware Detector (HMD) for mobile devices, that refines malware representation in segmented time-series through a Multiple Instance Learning (MIL) approach. We address the mislabeling issue in real-time HMDs, where benign segments in malware time-series incorrectly inherit malware labels, leading to increased false positives. Utilizing the proposed Malicious Discriminative Score within the MIL framework, RT-HMD effectively identifies localized malware behaviors, thereby improving the predictive accuracy. Empirical analysis, using a hardware telemetry dataset collected from a mobile platform across 723 benign and 1033 malware samples, shows a 5% precision boost while maintaining recall, outperforming baselines affected by mislabeled benign segments. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Under peer review

arXiv:2404.03219 [pdf, other]

iSeg: Interactive 3D Segmentation via Interactive Attention

Authors: Itai Lang, Fei Xu, Dale Decatur, Sudarshan Babu, Rana Hanocka

Abstract: We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same se… ▽ More We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Project page: https://threedle.github.io/iSeg/

arXiv:2403.12297 [pdf, other]

Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach

Authors: Maria Mahbub, Gregory M. Dams, Sudarshan Srinivasan, Caitlin Rizy, Ioana Danciu, Jodie Trafton, Kathryn Knight

Abstract: Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10),… ▽ More Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10), lack granularity for certain diagnoses, but clinicians will add this granularity (as that found within the Diagnostic and Statistical Manual of Mental Disorders classification or DSM-5) as supplemental unstructured text in clinical notes. Traditional natural language processing (NLP) methods face limitations in accurately parsing such diverse clinical language. Large Language Models (LLMs) offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of LLMs for extracting severity-related information for various SUD diagnoses from clinical notes. We propose a workflow employing zero-shot learning of LLMs with carefully crafted prompts and post-processing techniques. Through experimentation with Flan-T5, an open-source LLM, we demonstrate its superior recall compared to the rule-based approach. Focusing on 11 categories of SUD diagnoses, we show the effectiveness of LLMs in extracting severity information, contributing to improved risk assessment and treatment planning for SUD patients. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 10 pages, 4 figures, 2 tables

arXiv:2402.04035 [pdf, ps, other]

Low-Distortion Clustering with Ordinal and Limited Cardinal Information

Authors: Jakob Burkhardt, Ioannis Caragiannis, Karl Fehrs, Matteo Russo, Chris Schwiegelshohn, Sudarshan Shyam

Abstract: Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only… ▽ More Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only implicitly via $n$ rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using $d$. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than $k$ clusters but compare their social cost to that of the optimal $k$-clusterings. We show that using exponentially (in terms of $k$) many clusters, we can get low (constant or logarithmic) distortion for the $k$-center and $k$-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for $k$-median and $k$-center, we show that a number of queries that is polynomial in $k$ and only logarithmic in $n$ (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: to appear in AAAI 2024

arXiv:2402.00637 [pdf, other]

Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird's-Eye-View

Authors: Arindam Das, Sudarshan Paul, Niko Scholz, Akhilesh Kumar Malviya, Ganesh Sistu, Ujjwal Bhattacharya, Ciarán Eising

Abstract: Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when s… ▽ More Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when subjected to intense sun glare. Conversely, cost-effective sensors like ultrasonic sensors remain largely unaffected under these conditions. Therefore, we present, to our knowledge, the first end-to-end multimodal fusion model tailored for efficient obstacle perception in a bird's-eye-view (BEV) perspective, utilizing fisheye cameras and ultrasonic sensors. Initially, ResNeXt-50 is employed as a set of unimodal encoders to extract features specific to each modality. Subsequently, the feature space associated with the visible spectrum undergoes transformation into BEV. The fusion of these two modalities is facilitated via concatenation. At the same time, the ultrasonic spectrum-based unimodal feature maps pass through content-aware dilated convolution, applied to mitigate the sensor misalignment between two sensors in the fused feature space. Finally, the fused features are utilized by a two-stage semantic occupancy decoder to generate grid-wise predictions for precise obstacle perception. We conduct a systematic investigation to determine the optimal strategy for multimodal fusion of both sensors. We provide insights into our dataset creation procedures, annotation guidelines, and perform a thorough data analysis to ensure adequate coverage of all scenarios. When applied to our dataset, the experimental results underscore the robustness and effectiveness of our proposed multimodal fusion approach. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 16 pages, 12 Figures, 6 tables

arXiv:2401.10733 [pdf, other]

Dynamic Q&A of Clinical Documents with Large Language Models

Authors: Ran Elgedawy, Ioana Danciu, Maria Mahbub, Sudarshan Srinivasan

Abstract: Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural langua… ▽ More Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making. △ Less

Submitted 2 July, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: 15 pages, 4 figures

arXiv:2401.10580 [pdf, other]

PHOENIX: Open-Source Language Adaption for Direct Preference Optimization

Authors: Matthias Uhlig, Sigurd Schacht, Sudarshan Kamath Barkur

Abstract: Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other lang… ▽ More Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other languages is still an underdeveloped area of research. The recent publication of models such as Llama-2 and Zephyr has provided new insights into architectural improvements and the use of human feedback. However, insights into adapting these techniques to other languages remain scarce. In this paper, we build on latest improvements and apply the Direct Preference Optimization(DPO) approach to the German language. The model is available at https://huggingface.co/DRXD1000/Phoenix. △ Less

Submitted 19 January, 2024; originally announced January 2024.

arXiv:2401.08840 [pdf, other]

doi 10.1111/cgf.14955

Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks

Authors: Sudarshan Devkota, Sumanta Pattanaik

Abstract: In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a map** between spatial coordinates… ▽ More In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a map** between spatial coordinates and intensity values. We compare different encoding schemes and demonstrate the superiority of multi-resolution hash encoding in terms of compression quality and training efficiency. Furthermore, we leverage optimization-based meta-learning, specifically using the Reptile algorithm, to learn weight initialization for neural representations tailored to volumetric data, enabling faster convergence during optimization. Additionally, we compare our approach with state-of-the-art methods to showcase improved image quality and compression ratios. These findings highlight the potential of coordinate-based networks and multi-resolution hash encoding for an efficient and accurate representation of volumetric data, paving the way for advancements in large-scale data visualization and other applications. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Journal ref: Computer Graphics Forum (2023), 42: e14955

arXiv:2312.04874 [pdf, other]

Interpretable Underwater Diver Gesture Recognition

Authors: Sudeep Mangalvedhekar, Shreyas Nahar, Sudarshan Maskare, Kaushal Mahajan, Dr. Anant Bagade

Abstract: In recent years, usage and applications of Autonomous Underwater Vehicles has grown rapidly. Interaction of divers with the AUVs remains an integral part of the usage of AUVs for various applications and makes building robust and efficient underwater gesture recognition systems extremely important. In this paper, we propose an Underwater Gesture Recognition system trained on the Cognitive Autonomo… ▽ More In recent years, usage and applications of Autonomous Underwater Vehicles has grown rapidly. Interaction of divers with the AUVs remains an integral part of the usage of AUVs for various applications and makes building robust and efficient underwater gesture recognition systems extremely important. In this paper, we propose an Underwater Gesture Recognition system trained on the Cognitive Autonomous Diving Buddy Underwater gesture dataset using deep learning that achieves 98.01\% accuracy on the dataset, which to the best of our knowledge is the best performance achieved on this dataset at the time of writing this paper. We also improve the Gesture Recognition System Interpretability by using XAI techniques to visualize the model's predictions. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04071 [pdf, other]

Synergistic Signals: Exploiting Co-Engagement and Semantic Links via Graph Neural Networks

Authors: Zijie Huang, Baolin Li, Hafez Asgharzadeh, Anne Cocos, Lingyi Liu, Evan Cox, Colby Wise, Sudarshan Lamkhede

Abstract: Given a set of candidate entities (e.g. movie titles), the ability to identify similar entities is a core capability of many recommender systems. Most often this is achieved by collaborative filtering approaches, i.e. if users co-engage with a pair of entities frequently enough, the embeddings should be similar. However, relying on co-engagement data alone can result in lower-quality embeddings fo… ▽ More Given a set of candidate entities (e.g. movie titles), the ability to identify similar entities is a core capability of many recommender systems. Most often this is achieved by collaborative filtering approaches, i.e. if users co-engage with a pair of entities frequently enough, the embeddings should be similar. However, relying on co-engagement data alone can result in lower-quality embeddings for new and unpopular entities. We study this problem in the context recommender systems at Netflix. We observe that there is abundant semantic information such as genre, content maturity level, themes, etc. that complements co-engagement signals and provides interpretability in similarity models. To learn entity similarities from both data sources holistically, we propose a novel graph-based approach called SemanticGNN. SemanticGNN models entities, semantic concepts, collaborative edges, and semantic edges within a large-scale knowledge graph and conducts representation learning over it. Our key technical contributions are twofold: (1) we develop a novel relation-aware attention graph neural network (GNN) to handle the imbalanced distribution of relation types in our graph; (2) to handle web-scale graph data that has millions of nodes and billions of edges, we develop a novel distributed graph training paradigm. The proposed model is successfully deployed within Netflix and empirical experiments indicate it yields up to 35% improvement in performance on similarity judgment tasks. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.12396 [pdf, other]

GreenFPGA: Evaluating FPGAs as Environmentally Sustainable Computing Solutions

Authors: Chetan Choppali Sudarshan, Aman Arora, Vidya A. Chhabria

Abstract: Growing global concerns about climate change highlight the need for environmentally sustainable computing. The ecological impact of computing, including operational and embodied, is a key consideration. Field Programmable Gate Arrays (FPGAs) stand out as promising sustainable computing platforms due to their reconfigurability across various applications. This paper introduces GreenFPGA, a tool est… ▽ More Growing global concerns about climate change highlight the need for environmentally sustainable computing. The ecological impact of computing, including operational and embodied, is a key consideration. Field Programmable Gate Arrays (FPGAs) stand out as promising sustainable computing platforms due to their reconfigurability across various applications. This paper introduces GreenFPGA, a tool estimating the total carbon footprint (CFP) of FPGAs over their lifespan, considering design, manufacturing, reconfigurability (reuse), operation, disposal, and recycling. Using GreenFPGA, the paper evaluates scenarios where the ecological benefits of FPGA reconfigurability outweigh operational and embodied carbon costs, positioning FPGAs as a environmentally sustainable choice for hardware acceleration compared to Application-Specific Integrated Circuits (ASICs). Experimental results show that FPGAs have lower CFP than ASICs, particularly for multiple distinct, low-volume applications, or short application lifespans. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: Under review at DAC 2024

arXiv:2310.17075 [pdf, other]

HyperFields: Towards Zero-Shot Generation of NeRFs from Text

Authors: Sudarshan Babu, Richard Liu, Avery Zhou, Michael Maire, Greg Shakhnarovich, Rana Hanocka

Abstract: We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth map** from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hyperne… ▽ More We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth map** from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields. △ Less

Submitted 13 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to ICML 2024, Project page: https://threedle.github.io/hyperfields/

arXiv:2310.09680 [pdf, other]

Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring

Authors: Ankitha Sudarshan, Vinay Samuel, Parth Patwa, Ibtihel Amara, Aman Chadha

Abstract: Automatic Speech Recognition (ASR) has witnessed a profound research interest. Recent breakthroughs have given ASR systems different prospects such as faithfully transcribing spoken language, which is a pivotal advancement in building conversational agents. However, there is still an imminent challenge of accurately discerning context-dependent words and phrases. In this work, we propose a novel a… ▽ More Automatic Speech Recognition (ASR) has witnessed a profound research interest. Recent breakthroughs have given ASR systems different prospects such as faithfully transcribing spoken language, which is a pivotal advancement in building conversational agents. However, there is still an imminent challenge of accurately discerning context-dependent words and phrases. In this work, we propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing leveraging the power of deep learning models in accurately delivering spot-on transcriptions across a wide variety of vocabularies and speaking styles. Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models integrating both language and acoustic modeling for better accuracy. We infused our network with the use of a transformer-based model to properly rescore the word lattice achieving remarkable capabilities with a palpable reduction in Word Error Rate (WER). We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses. △ Less

Submitted 3 March, 2024; v1 submitted 14 October, 2023; originally announced October 2023.

arXiv:2309.05090 [pdf, other]

Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference

Authors: Sudarshan Sreeram, Bernhard Kainz

Abstract: Leveraging ML advancements to augment healthcare systems can improve patient outcomes. Yet, uninformed engineering decisions in early-stage research inadvertently hinder the feasibility of such solutions for high-throughput, on-device inference, particularly in settings involving legacy hardware and multi-modal gigapixel images. Through a preliminary case study concerning segmentation in cardiolog… ▽ More Leveraging ML advancements to augment healthcare systems can improve patient outcomes. Yet, uninformed engineering decisions in early-stage research inadvertently hinder the feasibility of such solutions for high-throughput, on-device inference, particularly in settings involving legacy hardware and multi-modal gigapixel images. Through a preliminary case study concerning segmentation in cardiology, we highlight the excess operational complexity in a suboptimally configured ML model from prior work and demonstrate that it can be sculpted away using pruning to meet deployment criteria. Our results show a compression rate of 1148x with minimal loss in quality (~4%) and, at higher rates, achieve faster inference on a CPU than the GPU baseline, stressing the need to consider task complexity and architectural details when using off-the-shelf models. With this, we consider avenues for future research in streamlining workflows for clinical researchers to develop models quicker and better suited for real-world use. △ Less

Submitted 1 November, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: Accepted at MedNeurIPS 2023

arXiv:2307.02367 [pdf, other]

Distance Preserving Machine Learning for Uncertainty Aware Accelerator Capacitance Predictions

Authors: Steven Goldenberg, Malachi Schram, Kishansingh Rajput, Thomas Britton, Chris Pappas, Dan Lu, Jared Walden, Majdi I. Radaideh, Sarah Cousineau, Sudarshan Harave

Abstract: Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techni… ▽ More Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techniques have shown promising results, but dimensionality reduction through standard deep neural network layers is not guaranteed to maintain the distance information necessary for Gaussian process models. We build on previous work by comparing the use of the singular value decomposition against a spectral-normalized dense layer as a feature extractor for a deep neural Gaussian process approximation model and apply it to a capacitance prediction problem for the High Voltage Converter Modulators in the Oak Ridge Spallation Neutron Source. Our model shows improved distance preservation and predicts in-distribution capacitance values with less than 1% error. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.13203 [pdf, other]

Neural Network Pruning for Real-time Polyp Segmentation

Authors: Suman Sapkota, Pranav Poudel, Sudarshan Regmi, Bibek Panthi, Binod Bhattarai

Abstract: Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising… ▽ More Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising model size. Neural network pruning has emerged as an exciting area that aims to eliminate redundant parameters to make the inference faster. In this study, we show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance. For computing the importance score, we use the Taylor First Order (TaylorFO) approximation of the change in network output for the removal of certain filters. Specifically, we employ a gradient-normalized backpropagation for the computation of the importance score. Through experiments in the polyp datasets, we validate that our approach can significantly reduce the parameter count and FLOPs retaining similar performance. △ Less

Submitted 22 June, 2023; originally announced June 2023.

arXiv:2306.09434 [pdf, other]

ECO-CHIP: Estimation of Carbon Footprint of Chiplet-based Architectures for Sustainable VLSI

Authors: Chetan Choppali Sudarshan, Nikhil Matkar, Sarma Vrudhula, Sachin S. Sapatnekar, Vidya A. Chhabria

Abstract: Decades of progress in energy-efficient and low-power design have successfully reduced the operational carbon footprint in the semiconductor industry. However, this has led to an increase in embodied emissions, encompassing carbon emissions arising from design, manufacturing, packaging, and other infrastructural activities. While existing research has developed tools to analyze embodied carbon at… ▽ More Decades of progress in energy-efficient and low-power design have successfully reduced the operational carbon footprint in the semiconductor industry. However, this has led to an increase in embodied emissions, encompassing carbon emissions arising from design, manufacturing, packaging, and other infrastructural activities. While existing research has developed tools to analyze embodied carbon at the computer architecture level for traditional monolithic systems, these tools do not apply to near-mainstream heterogeneous integration (HI) technologies. HI systems offer significant potential for sustainable computing by minimizing carbon emissions through two key strategies: ``reducing" computation by reusing pre-designed chiplet IP blocks and adopting hierarchical approaches to system design. The reuse of chiplets across multiple designs, even spanning multiple generations of integrated circuits (ICs), can substantially reduce embodied carbon emissions throughout the operational lifespan. This paper introduces a carbon analysis tool specifically designed to assess the potential of HI systems in facilitating greener VLSI system design and manufacturing approaches. The tool takes into account scaling, chiplet and packaging yields, design complexity, and even carbon overheads associated with advanced packaging techniques employed in heterogeneous systems. Experimental results demonstrate that HI can achieve a reduction of embodied carbon emissions up to 70\% compared to traditional large monolithic systems. These findings suggest that HI can pave the way for sustainable computing practices, contributing to a more environmentally conscious semiconductor industry. △ Less

Submitted 14 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Accepted at International Symposium on High-Performance Computer Architecture (HPCA)

arXiv:2306.06678 [pdf, other]

Scheduling of Intermittent Query Processing

Authors: Saranya C, Sudarshan S

Abstract: Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using stream processing engines can be very inefficient since there is often a significant overhead per tuple or micro-batch. The cost of computation can be significantl… ▽ More Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using stream processing engines can be very inefficient since there is often a significant overhead per tuple or micro-batch. The cost of computation can be significantly reduced by using the wider window available for computation. In this work, we present scheduling schemes where the overhead cost is minimized while meeting the query deadline constraints. For such queries, since the result is needed only at the deadline, tuples can be processed in larger batches, instead of using micro-batches. We present scheduling schemes for single and multi query scenarios. The proposed scheduling algorithms have been implemented as a Custom Query Scheduler, on top of Apache Spark. Our performance study with TPC-H data, under single and multi query modes, shows orders of magnitude improvement as compared to naively using Spark streaming. △ Less

Submitted 21 April, 2024; v1 submitted 11 June, 2023; originally announced June 2023.

arXiv:2305.17797 [pdf, other]

T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection

Authors: Sudarshan Regmi, Bibek Panthi, Sakar Dotel, Prashnna K. Gyawali, Danail Stoyanov, Binod Bhattarai

Abstract: Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorpora… ▽ More Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee substantial improvement in OOD detection performance. In this work, we introduce T2FNorm, a novel approach to transforming features to hyperspherical space during training, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demonstrates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods. △ Less

Submitted 8 June, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

arXiv:2305.10540 [pdf, other]

doi 10.1109/TIT.2024.3361388

Generalization Bounds for Neural Belief Propagation Decoders

Authors: Sudarshan Adiga, Xin Xiao, Ravi Tandon, Bane Vasic, Tamal Bose

Abstract: Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms… ▽ More Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms. In this paper, we investigate the generalization capabilities of NBP decoders. Specifically, the generalization gap of a decoder is the difference between empirical and expected bit-error-rate(s). We present new theoretical results which bound this gap and show the dependence on the decoder complexity, in terms of code parameters (blocklength, message length, variable/check node degrees), decoding iterations, and the training dataset size. Results are presented for both regular and irregular parity-check matrices. To the best of our knowledge, this is the first set of theoretical results on generalization performance of neural network based decoders. We present experimental results to show the dependence of generalization gap on the training dataset size, and decoding iterations for different codes. △ Less

Submitted 20 April, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Published in IEEE Transactions on Information Theory (2024)

arXiv:2305.08777 [pdf, other]

Question-Answering System Extracts Information on Injection Drug Use from Clinical Notes

Authors: Maria Mahbub, Ian Goethert, Ioana Danciu, Kathryn Knight, Sudarshan Srinivasan, Suzanne Tamang, Karine Rozenberg-Ben-Dror, Hugo Solares, Susana Martins, Jodie Trafton, Edmon Begoli, Gregory Peterson

Abstract: Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU infor… ▽ More Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU information can be indicated is unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. Methods: To address this gap in clinical information, we design and demonstrate a question-answering (QA) framework to extract information on IDU from clinical notes. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) develo** and testing the QA model. We utilize 2323 clinical notes of 1145 patients sourced from the VA Corporate Data Warehouse to construct the gold-standard dataset for develo** and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information on temporally out-of-distribution data. Results: Here we show that for a strict match between gold-standard and predicted answers, the QA model achieves 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. Conclusions: Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care. △ Less

Submitted 28 December, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 31 pages, 11 tables, 7 figures

arXiv:2305.01339 [pdf, ps, other]

Guaranteeing Envy-Freeness under Generalized Assignment Constraints

Authors: Siddharth Barman, Arindam Khan, Sudarshan Shyam, K. V. N. Sreenivas

Abstract: We study fair division of goods under the broad class of generalized assignment constraints. In this constraint framework, the sizes and values of the goods are agent-specific, and one needs to allocate the goods among the agents fairly while further ensuring that each agent receives a bundle of total size at most the corresponding budget of the agent. Since, in such a constraint setting, it may n… ▽ More We study fair division of goods under the broad class of generalized assignment constraints. In this constraint framework, the sizes and values of the goods are agent-specific, and one needs to allocate the goods among the agents fairly while further ensuring that each agent receives a bundle of total size at most the corresponding budget of the agent. Since, in such a constraint setting, it may not always be feasible to partition all the goods among the agents, we conform -- as in recent works -- to the construct of charity to designate the set of unassigned goods. For this allocation framework, we obtain existential and computational guarantees for envy-free (appropriately defined) allocation of divisible and indivisible goods, respectively, among agents with individual, additive valuations for the goods. We deem allocations to be fair by evaluating envy only with respect to feasible subsets. In particular, an allocation is said to be feasibly envy-free (FEF) iff each agent prefers its bundle over every (budget) feasible subset within any other agent's bundle (and within the charity). The current work establishes that, for divisible goods, FEF allocations are guaranteed to exist and can be computed efficiently under generalized assignment constraints. In the context of indivisible goods, FEF allocations do not necessarily exist, and hence, we consider the fairness notion of feasible envy-freeness up to any good (FEFx). We show that, under generalized assignment constraints, an FEFx allocation of indivisible goods always exists. In fact, our FEFx result resolves open problems posed in prior works. Further, for indivisible goods and under generalized assignment constraints, we provide a pseudo-polynomial time algorithm for computing FEFx allocations, and a fully polynomial-time approximation scheme (FPTAS) for computing approximate FEFx allocations. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: 29 pages

arXiv:2304.05301 [pdf, other]

TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning

Authors: William Won, Midhilesh Elavazhagan, Sudarshan Srinivasan, Ajaya Durg, Samvit Kaul, Swati Gupta, Tushar Krishna

Abstract: The surge of artificial intelligence, specifically large language models, has led to a rapid advent towards the development of large-scale machine learning training clusters. Collective communications within these clusters tend to be heavily bandwidth-bound, necessitating techniques to optimally utilize the available network bandwidth. This puts the routing algorithm for the collective at the fore… ▽ More The surge of artificial intelligence, specifically large language models, has led to a rapid advent towards the development of large-scale machine learning training clusters. Collective communications within these clusters tend to be heavily bandwidth-bound, necessitating techniques to optimally utilize the available network bandwidth. This puts the routing algorithm for the collective at the forefront of determining the performance. Unfortunately, communication libraries used in distributed machine learning today are limited by a fixed set of routing algorithms. This constraints collective performance within the domain of next-generation training clusters that employ intricate, heterogeneous, and asymmetric, large-scale topologies. Further, the emergence of irregular topologies attributed to runtime phenomena such as device failures serves to compound the complexity of the challenge. To this end, this paper introduces TACOS, an automated synthesizer that generates topology-aware collective algorithms for common distributed machine learning collectives across arbitrary input network topologies. TACOS was able to synthesize All-Reduce algorithm for a heterogeneous 512-NPU system in just 6.09 minutes while achieving performance improvement up to 4.27x over state-of-the-art prior work. TACOS exhibits high scalability, with synthesis time scaling quadratically with the number of NPUs. In contrast to prior works' NP-hard approaches, TACOS with 40K NPUs completes in 2.52 hours. △ Less

Submitted 29 March, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2304.01074 [pdf, other]

FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation

Authors: Sudarshan S Harithas, Gurkirat Singh, Aneesh Chavan, Sarthak Sharma, Suraj Patni, Chetan Arora, K. Madhava Krishna

Abstract: We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads… ▽ More We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2303.14006 [pdf, other]

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

Authors: William Won, Taekyung Heo, Saeed Rashidi, Srinivas Sridharan, Sudarshan Srinivasan, Tushar Krishna

Abstract: As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emergin… ▽ More As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emerging distributed training systems. This results in a complex SW/HW co-design stack of distributed training, necessitating a modeling/simulation infrastructure for design-space exploration. In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms. More specifically, (i) we enable ASTRA-sim to support arbitrary model parallelization strategies via a graph-based training-loop implementation, (ii) we implement a parameterizable multi-dimensional heterogeneous topology generation infrastructure with analytical performance estimates enabling simulating target systems at scale, and (iii) we enhance the memory system modeling to support accurate modeling of in-network collective communication and disaggregated memory systems. With such capabilities, we run comprehensive case studies targeting emerging distributed models and platforms. This infrastructure lets system designers swiftly traverse the complex co-design stack and give meaningful insights when designing and deploying distributed training platforms at scale. △ Less

Submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.12888 [pdf, other]

A dynamic risk score for early prediction of cardiogenic shock using machine learning

Authors: Yuxuan Hu, Albert Lui, Mark Goldstein, Mukund Sudarshan, Andrea Tinsay, Cindy Tsui, Samuel Maidman, John Medamana, Neil Jethani, Aahlad Puli, Vuthy Nguy, Yindalon Aphinyanaphongs, Nicholas Kiefer, Nathaniel Smilowitz, James Horowitz, Tania Ahuja, Glenn I Fishman, Judith Hochman, Stuart Katz, Samuel Bernard, Rajesh Ranganath

Abstract: Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to… ▽ More Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to cardiogenic shock. However, early identification of cardiogenic shock has been challenging due to human providers' inability to process the enormous amount of data in the cardiac intensive care unit (ICU) and lack of an effective risk stratification tool. We developed a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac ICU with acute decompensated heart failure and/or myocardial infarction to predict onset of cardiogenic shock. To develop and validate CShock, we annotated cardiac ICU datasets with physician adjudicated outcomes. CShock achieved an area under the receiver operator characteristic curve (AUROC) of 0.820, which substantially outperformed CardShock (AUROC 0.519), a well-established risk score for cardiogenic shock prognosis. CShock was externally validated in an independent patient cohort and achieved an AUROC of 0.800, demonstrating its generalizability in other cardiac ICUs. △ Less

Submitted 28 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.10195 [pdf, other]

Remote Task-oriented Grasp Area Teaching By Non-Experts through Interactive Segmentation and Few-Shot Learning

Authors: Furkan Kaynar, Sudarshan Rajagopalan, Shaobo Zhou, Eckehard Steinbach

Abstract: A robot operating in unstructured environments must be able to discriminate between different gras** styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented gras**. We propose a novel two-step framework towards this aim. The first step involves gr… ▽ More A robot operating in unstructured environments must be able to discriminate between different gras** styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented gras**. We propose a novel two-step framework towards this aim. The first step involves grasp area estimation by segmentation. We receive grasp area demonstrations for a new task via interactive segmentation, and learn from these few demonstrations to estimate the required grasp area on an unseen scene for the given task. The second step is autonomous grasp estimation in the segmented region. To train the segmentation network for few-shot learning, we built a grasp area segmentation (GAS) dataset with 10089 images grouped into 1121 segmentation tasks. We benefit from an efficient meta learning algorithm for training for few-shot adaptation. Experimental evaluation showed that our method successfully detects the correct grasp area on the respective objects in unseen test scenes and effectively allows remote teaching of new grasp strategies by non-experts. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: Presented at the AAAI Workshop on Artificial Intelligence for User-Centric Assistance in at-Home Tasks (2023)

arXiv:2303.10145 [pdf, other]

doi 10.1145/3571600.3571634

Spectrum-inspired Low-light Image Translation for Saliency Detection

Authors: Kitty Varghese, Sudarshan Rajagopalan, Mohit Lamba, Kaushik Mitra

Abstract: Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only… ▽ More Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only tedious and time-consuming but also infeasible if a huge training corpus is required. We propose a technique that performs classical band-pass filtering in the Fourier space to transform well-lit images to low-light images and use them as a proxy for real low-light images. Unlike popular deep learning approaches which require learning thousands of parameters and enormous amounts of training data, the proposed transformation is fast and simple and easy to extend to other tasks such as low-light depth estimation. Our experiments show that the state-of-the-art saliency detection and depth estimation networks trained on our proxy low-light images perform significantly better on real low-light images than networks trained using existing strategies. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: Presented at The Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP) 2022

arXiv:2303.09293 [pdf]

A transformer-based approach to video frame-level prediction in Affective Behaviour Analysis In-the-wild

Authors: Dang-Khanh Nguyen, Ngoc-Huynh Ho, Sudarshan Pant, Hyung-Jeong Yang

Abstract: In recent years, transformer architecture has been a dominating paradigm in many applications, including affective computing. In this report, we propose our transformer-based model to handle Emotion Classification Task in the 5th Affective Behavior Analysis In-the-wild Competition. By leveraging the attentive model and the synthetic dataset, we attain a score of 0.4775 on the validation set of Aff… ▽ More In recent years, transformer architecture has been a dominating paradigm in many applications, including affective computing. In this report, we propose our transformer-based model to handle Emotion Classification Task in the 5th Affective Behavior Analysis In-the-wild Competition. By leveraging the attentive model and the synthetic dataset, we attain a score of 0.4775 on the validation set of Aff-Wild2, the dataset provided by the organizer. △ Less

Submitted 19 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: 3 pages, 1 figure

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2301.11962 [pdf, other]

On the Feasibility of Machine Learning Augmented Magnetic Resonance for Point-of-Care Identification of Disease

Authors: Raghav Singhal, Mukund Sudarshan, Anish Mahishi, Sri Kaushik, Luke Ginocchio, Angela Tong, Hersh Chandarana, Daniel K. Sodickson, Rajesh Ranganath, Sumit Chopra

Abstract: Early detection of many life-threatening diseases (e.g., prostate and breast cancer) within at-risk population can improve clinical outcomes and reduce cost of care. While numerous disease-specific "screening" tests that are closer to Point-of-Care (POC) are in use for this task, their low specificity results in unnecessary biopsies, leading to avoidable patient trauma and wasteful healthcare spen… ▽ More Early detection of many life-threatening diseases (e.g., prostate and breast cancer) within at-risk population can improve clinical outcomes and reduce cost of care. While numerous disease-specific "screening" tests that are closer to Point-of-Care (POC) are in use for this task, their low specificity results in unnecessary biopsies, leading to avoidable patient trauma and wasteful healthcare spending. On the other hand, despite the high accuracy of Magnetic Resonance (MR) imaging in disease diagnosis, it is not used as a POC disease identification tool because of poor accessibility. The root cause of poor accessibility of MR stems from the requirement to reconstruct high-fidelity images, as it necessitates a lengthy and complex process of acquiring large quantities of high-quality k-space measurements. In this study we explore the feasibility of an ML-augmented MR pipeline that directly infers the disease sidestep** the image reconstruction process. We hypothesise that the disease classification task can be solved using a very small tailored subset of k-space data, compared to image reconstruction. Towards that end, we propose a method that performs two tasks: 1) identifies a subset of the k-space that maximizes disease identification accuracy, and 2) infers the disease directly using the identified k-space subset, bypassing the image reconstruction step. We validate our hypothesis by measuring the performance of the proposed system across multiple diseases and anatomies. We show that comparable performance to image-based classifiers, trained on images reconstructed with full k-space data, can be achieved using small quantities of data: 8% of the data for detecting multiple abnormalities in prostate and brain scans, and 5% of the data for knee abnormalities. To better understand the proposed approach and instigate future research, we provide an extensive analysis and release code. △ Less

Submitted 2 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.00737 [pdf, other]

doi 10.1049/qtc2.12055

Rotational Abstractions for Verification of Quantum Fourier Transform Circuits

Authors: Arun Govindankutty, Sudarshan K. Srinivasan, Nimish Mathure

Abstract: With the race to build large-scale quantum computers and efforts to exploit quantum algorithms for efficient problem solving in science and engineering disciplines, the requirement to have efficient and scalable verification methods are of vital importance. We propose a novel formal verification method that is targeted at Quantum Fourier Transform (QFT) circuits. QFT is a fundamental quantum algor… ▽ More With the race to build large-scale quantum computers and efforts to exploit quantum algorithms for efficient problem solving in science and engineering disciplines, the requirement to have efficient and scalable verification methods are of vital importance. We propose a novel formal verification method that is targeted at Quantum Fourier Transform (QFT) circuits. QFT is a fundamental quantum algorithm that forms the basis of many quantum computing applications. The verification method employs abstractions of quantum gates used in QFT that leads to a reduction of the verification problem from Hilbert space to the quantifier free logic of bit-vectors. Very efficient decision procedures are available to reason about bit-vectors. Therefore, our method is able to scale up to the verification of QFT circuits with 10,000 qubits and 50 million quantum gates, providing a meteoric advance in the size of QFT circuits thus far verified using formal verification methods. △ Less

Submitted 2 January, 2023; originally announced January 2023.

Journal ref: IET Quantum Communication, March 2023

arXiv:2210.08083 [pdf, other]

Reference Based Color Transfer for Medical Volume Rendering

Authors: Sudarshan Devkota, Summanta Pattanaik

Abstract: The benefits of medical imaging are enormous. Medical images provide considerable amounts of anatomical information and this facilitates medical practitioners in performing effective disease diagnosis and deciding upon the best course of medical treatment. A transition from traditional monochromatic medical images like CT scans, X-Rays or MRI images to a colored 3D representation of the anatomical… ▽ More The benefits of medical imaging are enormous. Medical images provide considerable amounts of anatomical information and this facilitates medical practitioners in performing effective disease diagnosis and deciding upon the best course of medical treatment. A transition from traditional monochromatic medical images like CT scans, X-Rays or MRI images to a colored 3D representation of the anatomical structure further enhances the capabilities of medical professionals in extracting valuable medical information. The proposed framework in our research starts with performing color transfer by finding deep semantic correspondence between two medical images: a colored reference image, and a monochromatic CT scan or an MRI image. We extend this idea of reference-based colorization technique to perform colored volume rendering from a stack of grayscale medical images. Furthermore, we also propose to use an effective reference image recommendation system to aid in the selection of good reference images. With our approach, we successfully perform colored medical volume visualization and essentially eliminate the painstaking process of user interaction with a transfer function to obtain color and opacity parameters for volume rendering. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.08080 [pdf, other]

Deep Learning based Super-Resolution for Medical Volume Visualization with Direct Volume Rendering

Authors: Sudarshan Devkota, Sumanta Pattanaik

Abstract: Modern-day display systems demand high-quality rendering. However, rendering at higher resolution requires a large number of data samples and is computationally expensive. Recent advances in deep learning-based image and video super-resolution techniques motivate us to investigate such networks for high-fidelity upscaling of frames rendered at a lower resolution to a higher resolution. While our w… ▽ More Modern-day display systems demand high-quality rendering. However, rendering at higher resolution requires a large number of data samples and is computationally expensive. Recent advances in deep learning-based image and video super-resolution techniques motivate us to investigate such networks for high-fidelity upscaling of frames rendered at a lower resolution to a higher resolution. While our work focuses on super-resolution of medical volume visualization performed with direct volume rendering, it is also applicable for volume visualization with other rendering techniques. We propose a learning-based technique where our proposed system uses color information along with other supplementary features gathered from our volume renderer to learn efficient upscaling of a low-resolution rendering to a higher-resolution space. Furthermore, to improve temporal stability, we also implement the temporal reprojection technique for accumulating history samples in volumetric rendering. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.00263 [pdf, other]

Fine-tuning Wav2vec for Vocal-burst Emotion Recognition

Authors: Dang-Khanh Nguyen, Sudarshan Pant, Ngoc-Huynh Ho, Guee-Sang Lee, Soo-Huyng Kim, Hyung-Jeong Yang

Abstract: The ACII Affective Vocal Bursts (A-VB) competition introduces a new topic in affective computing, which is understanding emotional expression using the non-verbal sound of humans. We are familiar with emotion recognition via verbal vocal or facial expression. However, the vocal bursts such as laughs, cries, and signs, are not exploited even though they are very informative for behavior analysis. T… ▽ More The ACII Affective Vocal Bursts (A-VB) competition introduces a new topic in affective computing, which is understanding emotional expression using the non-verbal sound of humans. We are familiar with emotion recognition via verbal vocal or facial expression. However, the vocal bursts such as laughs, cries, and signs, are not exploited even though they are very informative for behavior analysis. The A-VB competition comprises four tasks that explore non-verbal information in different spaces. This technical report describes the method and the result of SclabCNU Team for the tasks of the challenge. We achieved promising results compared to the baseline model provided by the organizers. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2209.14021 [pdf, other]

doi 10.1145/3565053.3565059

A Framework for Formal Verification of DRAM Controllers

Authors: Lukas Steiner, Chirag Sudarshan, Matthias Jung, Dominik Stoffel, Norbert Wehn

Abstract: The large number of recent JEDEC DRAM standard releases and their increasing feature set makes it difficult for designers to rapidly upgrade the memory controller IPs to each new standard. Especially the hardware verification is challenging due to the higher protocol complexity of standards like DDR5, LPDDR5 or HBM3 in comparison with their predecessors. With traditional simulation-based verificat… ▽ More The large number of recent JEDEC DRAM standard releases and their increasing feature set makes it difficult for designers to rapidly upgrade the memory controller IPs to each new standard. Especially the hardware verification is challenging due to the higher protocol complexity of standards like DDR5, LPDDR5 or HBM3 in comparison with their predecessors. With traditional simulation-based verification it is laborious to guarantee the coverage of all possible states, especially for control flow rich memory controllers. This has a direct impact on the time-to-market. A promising alternative is formal verification because it allows to ensure protocol compliance based on mathematical proofs. However, with regard to memory controllers no fully-automated verification process has been presented in the state-of-the-art yet, which means there is still a potential risk of human error. In this paper we present a framework that automatically generates SystemVerilog Assertions for a DRAM protocol. In addition, we show how the framework can be used efficiently for different tasks of memory controller development. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: ACM/IEEE International Symposium on Memory Systems (MEMSYS 2022)

arXiv:2209.10001 [pdf, other]

Building Flexible, Low-Cost Wireless Access Networks With Magma

Authors: Shaddi Hasan, Amar Padmanabhan, Bruce Davie, Jennifer Rexford, Ulas Kozat, Hunter Gatewood, Shruti Sanadhya, Nick Yurchenko, Tariq Al-Khasib, Oriol Batalla, Marie Bremner, Andrei Lee, Evgeniy Makeev, Scott Moeller, Alex Rodriguez, Pravin Shelar, Karthik Subraveti, Sudarshan Kandi, Alejandro Xoconostle, Praveen Kumar Ramakrishnan, Xiaochen Tian, Anoop Tomar

Abstract: Billions of people remain without Internet access due to availability or affordability of service. In this paper, we present Magma, an open and flexible system for building low-cost wireless access networks. Magma aims to connect users where operator economics are difficult due to issues such as low population density or income levels, while preserving features expected in cellular networks such a… ▽ More Billions of people remain without Internet access due to availability or affordability of service. In this paper, we present Magma, an open and flexible system for building low-cost wireless access networks. Magma aims to connect users where operator economics are difficult due to issues such as low population density or income levels, while preserving features expected in cellular networks such as authentication and billing policies. To achieve this, and in contrast to traditional cellular networks, Magma adopts an approach that extensively leverages Internet design patterns, terminating access network-specific protocols at the edge and abstracting the access network from the core architecture. This decision allows Magma to refactor the wireless core using SDN (software-defined networking) principles and leverage other techniques from modern distributed systems. In doing so, Magma lowers cost and operational complexity for network operators while achieving resilience, scalability, and rich policy support. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 15 pages, 10 figures, to be published in the 20th USENIX Symposium on Networked Systems Design and Implementation (2023), source code available at https://github.com/magma/magma

arXiv:2209.04430 [pdf, other]

doi 10.1007/s12036-023-09920-4

Investigation of a Machine learning methodology for the SKA pulsar search pipeline

Authors: Shashank Sanjay Bhat, Thiagaraj Prabu, Ben Stappers, Atul Ghalame, Snehanshu Saha, T. S. B Sudarshan, Zafiirah Hosenie

Abstract: The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called… ▽ More The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called Mask R-CNN to detect candidate signatures in the SKA pulsar search pipeline. We have trained the Mask R-CNN model to detect candidate images. A custom annotation tool was developed to mark the regions of interest in large datasets efficiently. We have successfully demonstrated this algorithm by detecting candidate signatures on a simulation dataset. The paper presents details of this work with a highlight on the future prospects. △ Less

Submitted 17 January, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Journal ref: Journal of Astronomy and Astrophysics SKA special issue 2022-23 (Under review)

arXiv:2208.10784 [pdf, other]

doi 10.1088/2632-2153/acac01

Building Robust Machine Learning Models for Small Chemical Science Data: The Case of Shear Viscosity

Authors: Nikhil V. S. Avula, Shivanand K. Veesam, Sudarshan Behera, Sundaram Balasubramanian

Abstract: Shear viscosity, though being a fundamental property of all liquids, is computationally expensive to estimate from equilibrium molecular dynamics simulations. Recently, Machine Learning (ML) methods have been used to augment molecular simulations in many contexts, thus showing promise to estimate viscosity too in a relatively inexpensive manner. However, ML methods face significant challenges like… ▽ More Shear viscosity, though being a fundamental property of all liquids, is computationally expensive to estimate from equilibrium molecular dynamics simulations. Recently, Machine Learning (ML) methods have been used to augment molecular simulations in many contexts, thus showing promise to estimate viscosity too in a relatively inexpensive manner. However, ML methods face significant challenges like overfitting when the size of the data set is small, as is the case with viscosity. In this work, we train several ML models to predict the shear viscosity of a Lennard-Jones (LJ) fluid, with particular emphasis on addressing issues arising from a small data set. Specifically, the issues related to model selection, performance estimation and uncertainty quantification were investigated. First, we show that the widely used performance estimation procedure of using a single unseen data set shows a wide variability on small data sets. In this context, the common practice of using Cross validation (CV) to select the hyperparameters (model selection) can be adapted to estimate the generalization error (performance estimation) as well. We compare two simple CV procedures for their ability to do both model selection and performance estimation, and find that k-fold CV based procedure shows a lower variance of error estimates. We discuss the role of performance metrics in training and evaluation. Finally, Gaussian Process Regression (GPR) and ensemble methods were used to estimate the uncertainty on individual predictions. The uncertainty estimates from GPR were also used to construct an applicability domain using which the ML models provided more reliable predictions on another small data set generated in this work. Overall, the procedures prescribed in this work, together, lead to robust ML models for small data sets. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: main: 17 pages, 11 figures ; SI: 55 pages, 29 figures ; to be submitted to Journal of Chemical Physics

Journal ref: Mach. Learn.: Sci. Technol. 3 (2022) 045032

arXiv:2208.08579 [pdf, other]

DIET: Conditional independence testing with marginal dependence measures of residual information

Authors: Mukund Sudarshan, Aahlad Manas Puli, Wesley Tansey, Rajesh Ranganath

Abstract: Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$, having observed covariates $z$. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which… ▽ More Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$, having observed covariates $z$. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which lead to a loss in power. We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues by leveraging marginal independence statistics to test conditional independence relationships. DIET tests the marginal independence of two random variables: $F(x \mid z)$ and $F(y \mid z)$ where $F(\cdot \mid z)$ is a conditional cumulative distribution function (CDF). These variables are termed "information residuals." We give sufficient conditions for DIET to achieve finite sample type-1 error control and power greater than the type-1 error rate. We then prove that when using the mutual information between the information residuals as a test statistic, DIET yields the most powerful conditionally valid test. Finally, we show DIET achieves higher power than other tractable CRTs on several synthetic and real benchmarks. △ Less

Submitted 11 April, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.08168 [pdf, other]

Finding Fair Allocations under Budget Constraints

Authors: Siddharth Barman, Arindam Khan, Sudarshan Shyam, K. V. N. Sreenivas

Abstract: We study the fair allocation of indivisible goods among agents with identical, additive valuations but individual budget constraints. Here, the indivisible goods--each with a specific size and value--need to be allocated such that the bundle assigned to each agent is of total size at most the agent's budget. Since envy-free allocations do not necessarily exist in the indivisible goods context, com… ▽ More We study the fair allocation of indivisible goods among agents with identical, additive valuations but individual budget constraints. Here, the indivisible goods--each with a specific size and value--need to be allocated such that the bundle assigned to each agent is of total size at most the agent's budget. Since envy-free allocations do not necessarily exist in the indivisible goods context, compelling relaxations--in particular, the notion of envy-freeness up to $k$ goods (EFk)--have received significant attention in recent years. In an EFk allocation, each agent prefers its own bundle over that of any other agent, up to the removal of $k$ goods, and the agents have similarly bounded envy against the charity (which corresponds to the set of all unallocated goods). Recently, Wu et al. (2021) showed that an allocation that satisfies the budget constraints and maximizes the Nash social welfare is $1/4$-approximately EF1. However, the computation (or even existence) of exact EFk allocations remained an intriguing open problem. We make notable progress towards this by proposing a simple, greedy, polynomial-time algorithm that computes EF2 allocations under budget constraints. Our algorithmic result implies the universal existence of EF2 allocations in this fair division context. The analysis of the algorithm exploits intricate structural properties of envy-freeness. Interestingly, the same algorithm also provides EF1 guarantees for important special cases. Specifically, we settle the existence of EF1 allocations for instances in which: (i) the value of each good is proportional to its size, (ii) all goods have the same size, or (iii) all the goods have the same value. Our EF2 result extends to the setting wherein the goods' sizes are agent specific. △ Less

Submitted 17 March, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: 22 pages

arXiv:2207.10293 [pdf, other]

Affective Behavior Analysis using Action Unit Relation Graph and Multi-task Cross Attention

Authors: Dang-Khanh Nguyen, Sudarshan Pant, Ngoc-Huynh Ho, Guee-Sang Lee, Soo-Huyng Kim, Hyung-Jeong Yang

Abstract: Facial behavior analysis is a broad topic with various categories such as facial emotion recognition, age, and gender recognition. Many studies focus on individual tasks while the multi-task learning approach is still an open research issue and requires more research. In this paper, we present our solution and experiment result for the Multi-Task Learning challenge of the Affective Behavior Analys… ▽ More Facial behavior analysis is a broad topic with various categories such as facial emotion recognition, age, and gender recognition. Many studies focus on individual tasks while the multi-task learning approach is still an open research issue and requires more research. In this paper, we present our solution and experiment result for the Multi-Task Learning challenge of the Affective Behavior Analysis in-the-wild competition. The challenge is a combination of three tasks: action unit detection, facial expression recognition, and valance-arousal estimation. To address this challenge, we introduce a cross-attentive module to improve multi-task learning performance. Additionally, a facial graph is applied to capture the association among action units. As a result, we achieve the evaluation measure of 128.8 on the validation data provided by the organizers, which outperforms the baseline result of 30. △ Less

Submitted 3 October, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Showing 1–50 of 156 results for author: Sudarshan