-
Optically Pumped Terahertz Amplitude Modulation in Type-II Ge QD/Si heterostructures grown via Molecular Beam Epitaxy
Authors:
Suprovat Ghosh,
Abir Mukherjee,
Sudarshan Singh,
Samit K Ray,
Ananjan Basu,
Santanu Manna,
Samaresh Das
Abstract:
This article exploits group-IV germanium (Ge) quantum dots (QDs) on Silicon-on-Insulator (SOI) grown by molecular beam epitaxy (MBE) in order to explore its optical behaviour in the Terahertz (THz) regime. In this work, Ge QDs, pumped by an above bandgap near infrared wavelength, exhibit THz amplitude modulation in the frequency range of 0.1-1.0 THz. The epitaxial Ge QDs outperform reference SOI s…
▽ More
This article exploits group-IV germanium (Ge) quantum dots (QDs) on Silicon-on-Insulator (SOI) grown by molecular beam epitaxy (MBE) in order to explore its optical behaviour in the Terahertz (THz) regime. In this work, Ge QDs, pumped by an above bandgap near infrared wavelength, exhibit THz amplitude modulation in the frequency range of 0.1-1.0 THz. The epitaxial Ge QDs outperform reference SOI substrate in THz amplitude modulation owing to higher carrier generation in weakly confined dots compared to its bulk counterpart. This is further corroborated using theoretical model based on the non-equilibrium Green's function (NEGF) method. This model enables the calculation of photo carriers generated (PCG) and their confinement in the Ge QD region. Our model also reroutes the calculation from PCG to corresponding plasma frequency and hence to refractive index and THz photo-conductivity. Moreover, the photo-generated confined holes accumulation at the Ge QDs-Si interface is elevated after optical illumination, leading to a decreased THz photo-conductivity. This augmentation in THz photo-conductivity contributes to a significant enhancement of THz modulation depth ~77% at Ge QDs-Si interfaces compared to bare SOI at 0.1 THz.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
Authors:
Saeed Rashidi,
William Won,
Sudarshan Srinivasan,
Puneet Gupta,
Tushar Krishna
Abstract:
Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating h…
▽ More
Distributed Deep Neural Network (DNN) training is a technique to reduce the training overhead by distributing the training tasks into multiple accelerators, according to a parallelization strategy. However, high-performance compute and interconnects are needed for maximum speed-up and linear scaling of the system. Wafer-scale systems are a promising technology that allows for tightly integrating high-end accelerators with high-speed wafer-scale interconnects, making it an attractive platform for distributed training. However, the wafer-scale interconnect should offer high performance and flexibility for various parallelization strategies to enable maximum optimizations for compute and memory usage. In this paper, we propose FRED, a wafer-scale interconnect that is tailored for the high-BW requirements of wafer-scale networks and can efficiently execute communication patterns of different parallelization strategies. Furthermore, FRED supports in-switch collective communication execution that reduces the network traffic by approximately 2X. Our results show that FRED can improve the average end-to-end training time of ResNet-152, Transformer-17B, GPT-3, and Transformer-1T by 1.76X, 1.87X, 1.34X, and 1.4X, respectively when compared to a baseline waferscale 2D-Mesh fabric.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Real-time Digital RF Emulation -- II: A Near Memory Custom Accelerator
Authors:
Mandovi Mukherjee,
Xiangyu Mao,
Nael Rahman,
Coleman DeLude,
Joe Driscoll,
Sudarshan Sharma,
Payman Behnam,
Uday Kamal,
Jongseok Woo,
Daehyun Kim,
Sharjeel Khan,
Jianming Tong,
Jamin Seo,
Prachi Sinha,
Madhavan Swaminathan,
Tushar Krishna,
Santosh Pande,
Justin Romberg,
Saibal Mukhopadhyay
Abstract:
A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous…
▽ More
A near memory hardware accelerator, based on a novel direct path computational model, for real-time emulation of radio frequency systems is demonstrated. Our evaluation of hardware performance uses both application-specific integrated circuits (ASIC) and field programmable gate arrays (FPGA) methodologies: 1). The ASIC testchip implementation, using TSMC 28nm CMOS, leverages distributed autonomous control to extract concurrency in compute as well as low latency. It achieves a $518$ MHz per channel bandwidth in a prototype $4$-node system. The maximum emulation range supported in this paradigm is $9.5$ km with $0.24$ $μ$s of per-sample emulation latency. 2). The FPGA-based implementation, evaluated on a Xilinx ZCU104 board, demonstrates a $9$-node test case (two Transmitters, one Receiver, and $6$ passive reflectors) with an emulation range of $1.13$ km to $27.3$ km at $215$ MHz bandwidth.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Support for fragile porous dust in a gravitationally self-regulated disk around IM Lup
Authors:
Takahiro Ueda,
Ryo Tazaki,
Satoshi Okuzumi,
Mario Flock,
Prakruti Sudarshan
Abstract:
Protoplanetary disks, the birthplace of planets, are expected to be gravitationally unstable in their early phase of evolution. IM Lup, a well-known T-Tauri star, is surrounded by a protoplanetary disk with spiral arms likely caused by gravitational instability. The IM Lup disk has been observed using various methods, but develo** a unified explanatory model is challenging. Here we present a phy…
▽ More
Protoplanetary disks, the birthplace of planets, are expected to be gravitationally unstable in their early phase of evolution. IM Lup, a well-known T-Tauri star, is surrounded by a protoplanetary disk with spiral arms likely caused by gravitational instability. The IM Lup disk has been observed using various methods, but develo** a unified explanatory model is challenging. Here we present a physical model of the IM Lup disk that offers a comprehensive explanation for diverse observations spanning from near-infrared to millimeter wavelengths. Our findings underscore the importance of dust fragility in retaining the observed millimeter emission and reveal the preference for moderately porous dust to explain observed millimeter polarization. We also find that the inner disk region is likely heated by gas accretion, providing a natural explanation for bright millimeter emission within 20 au. The actively heated inner region in the model casts a 100-au-scale shadow, aligning seamlessly with the near-infrared scattered light observation. The presence of accretion heating also supports the fragile dust scenario in which accretion efficiently heat the disk midplane. Due to the fragility of dust, it is unlikely that a potential embedded planet at 100 au formed via pebble accretion in a smooth disk, pointing to local dust enhancement boosting pebble accretion or alternative pathways such as outward migration or gravitational fragmentation.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification
Authors:
Benjamin Hou,
Qingqing Zhu,
Tejas Sudarshan Mathai,
Qiao **,
Zhiyong Lu,
Ronald M. Summers
Abstract:
In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation,…
▽ More
In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation, it facilitates the inclusion of lateral view images and images from any desired viewing position. This opens up avenues for research into new and novel multimodal applications involving paired CT, X-ray images from various views, text, and binary labels. We demonstrate the applicability of DRR-RATE alongside existing large-scale chest X-ray resources, notably the CheXpert dataset and CheXnet model. Experiments demonstrate that CheXnet, when trained and tested on the DRR-RATE dataset, achieves sufficient to high AUC scores for the six common pathologies cited in common literature: Atelectasis, Cardiomegaly, Consolidation, Lung Lesion, Lung Opacity, and Pleural Effusion. Additionally, CheXnet trained on the CheXpert dataset can accurately identify several pathologies, even when operating out of distribution. This confirms that the generated DRR images effectively capture the essential pathology features from CT images. The dataset and labels are publicly accessible at https://huggingface.co/datasets/farrell236/DRR-RATE.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator
Authors:
Zhewen Yu,
Sudarshan Sreeram,
Krish Agrawal,
Junyi Wu,
Alexander Montgomerie-Corcoran,
Cheng Zhang,
Jianyi Cheng,
Christos-Savvas Bouganis,
Yiren Zhao
Abstract:
Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and i…
▽ More
Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and its scalability in data parallelism.
Exploiting weights and activations sparsity can further enhance memory storage and computation efficiency. However, existing approaches focus on exploiting sparsity in non-dataflow accelerators, which cannot be applied onto dataflow accelerators because of the large hardware design space introduced. As such, this could miss opportunities to find an optimal combination of sparsity features and hardware designs.
In this paper, we propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We propose a Hardware-Aware Sparsity Search (HASS) to systematically determine an efficient sparsity solution for dataflow accelerators. Over a set of models, we achieve an efficiency improvement ranging from 1.3$\times$ to 4.2$\times$ compared to existing sparse designs, which are either non-dataflow or non-hardware-aware. Particularly, the throughput of MobileNetV3 can be optimized to 4895 images per second. HASS is open-source: \url{https://github.com/Yu-Zhewen/HASS}
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Demystifying Platform Requirements for Diverse LLM Inference Use Cases
Authors:
Abhimanyu Bambhaniya,
Ritik Raj,
Geonhwa Jeong,
Souvik Kundu,
Sudarshan Srinivasan,
Midhilesh Elavazhagan,
Madhu Kumar,
Tushar Krishna
Abstract:
Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the…
▽ More
Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these parameter-heavy models efficiently for diverse inference use cases requires carefully designed hardware platforms with ample computing, memory, and network resources. With LLM deployment scenarios and models evolving at breakneck speed, the hardware requirements to meet SLOs remains an open research question. In this work, we present an analytical tool, GenZ, to study the relationship between LLM inference performance and various platform design parameters. Our analysis provides insights into configuring platforms for different LLM workloads and use cases. We quantify the platform requirements to support SOTA LLMs models like LLaMA and GPT-4 under diverse serving settings. Furthermore, we project the hardware capabilities needed to enable future LLMs potentially exceeding hundreds of trillions of parameters. The trends and insights derived from GenZ can guide AI engineers deploying LLMs as well as computer architects designing next-generation hardware accelerators and platforms. Ultimately, this work sheds light on the platform design considerations for unlocking the full potential of large language models across a spectrum of applications. The source code is available at https://github.com/abhibambhaniya/GenZ-LLM-Analyzer .
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Human-Generative AI Collaborative Problem Solving Who Leads and How Students Perceive the Interactions
Authors:
Gaoxia Zhu,
Vidya Sudarshan,
Jason Fok Kow,
Yew Soon Ong
Abstract:
This research investigates distinct human-generative AI collaboration types and students' interaction experiences when collaborating with generative AI (i.e., ChatGPT) for problem-solving tasks and how these factors relate to students' sense of agency and perceived collaborative problem solving. By analyzing the surveys and reflections of 79 undergraduate students, we identified three human-genera…
▽ More
This research investigates distinct human-generative AI collaboration types and students' interaction experiences when collaborating with generative AI (i.e., ChatGPT) for problem-solving tasks and how these factors relate to students' sense of agency and perceived collaborative problem solving. By analyzing the surveys and reflections of 79 undergraduate students, we identified three human-generative AI collaboration types: even contribution, human leads, and AI leads. Notably, our study shows that 77.21% of students perceived they led or had even contributed to collaborative problem-solving when collaborating with ChatGPT. On the other hand, 15.19% of the human participants indicated that the collaborations were led by ChatGPT, indicating a potential tendency for students to rely on ChatGPT. Furthermore, 67.09% of students perceived their interaction experiences with ChatGPT to be positive or mixed. We also found a positive correlation between positive interaction experience and a sense of positive agency. The results of this study contribute to our understanding of the collaboration between students and generative AI and highlight the need to study further why some students let ChatGPT lead collaborative problem-solving and how to enhance their interaction experience through curriculum and technology design.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Improved Content Understanding With Effective Use of Multi-task Contrastive Learning
Authors:
Akanksha Bindal,
Sudarshan Ramanujam,
Dave Golland,
TJ Hazen,
Tina Jiang,
Fengyu Zhang,
Peng Yan
Abstract:
In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling…
▽ More
In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. We observe positive transfer, leading to superior performance across all tasks when compared to training independently on each. Our model outperforms the baseline on zero shot learning and offers improved multilingual support, highlighting its potential for broader application. The specialized content embeddings produced by our model outperform generalized embeddings offered by OpenAI on Linkedin dataset and tasks. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications. Our work offers insights and best practices for the field to build on.
△ Less
Submitted 21 May, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
Trustworthy Actionable Perturbations
Authors:
Jesse Friedbaum,
Sudarshan Adiga,
Ravi Tandon
Abstract:
Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier's decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and ``foo…
▽ More
Counterfactuals, or modified inputs that lead to a different outcome, are an important tool for understanding the logic used by machine learning classifiers and how to change an undesirable classification. Even if a counterfactual changes a classifier's decision, however, it may not affect the true underlying class probabilities, i.e. the counterfactual may act like an adversarial attack and ``fool'' the classifier. We propose a new framework for creating modified inputs that change the true underlying probabilities in a beneficial way which we call Trustworthy Actionable Perturbations (TAP). This includes a novel verification procedure to ensure that TAP change the true class probabilities instead of acting adversarially. Our framework also includes new cost, reward, and goal definitions that are better suited to effectuating change in the real world. We present PAC-learnability results for our verification procedure and theoretically analyze our new method for measuring reward. We also develop a methodology for creating TAP and compare our results to those achieved by previous counterfactual methods.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Latency-Distortion Tradeoffs in Communicating Classification Results over Noisy Channels
Authors:
Noel Teku,
Sudarshan Adiga,
Ravi Tandon
Abstract:
In this work, the problem of communicating decisions of a classifier over a noisy channel is considered. With machine learning based models being used in variety of time-sensitive applications, transmission of these decisions in a reliable and timely manner is of significant importance. To this end, we study the scenario where a probability vector (representing the decisions of a classifier) at th…
▽ More
In this work, the problem of communicating decisions of a classifier over a noisy channel is considered. With machine learning based models being used in variety of time-sensitive applications, transmission of these decisions in a reliable and timely manner is of significant importance. To this end, we study the scenario where a probability vector (representing the decisions of a classifier) at the transmitter, needs to be transmitted over a noisy channel. Assuming that the distortion between the original probability vector and the reconstructed one at the receiver is measured via f-divergence, we study the trade-off between transmission latency and the distortion. We completely analyze this trade-off using uniform, lattice, and sparse lattice-based quantization techniques to encode the probability vector by first characterizing bit budgets for each technique given a requirement on the allowed source distortion. These bounds are then combined with results from finite-blocklength literature to provide a framework for analyzing the effects of both quantization distortion and distortion due to decoding error probability (i.e., channel effects) on the incurred transmission latency. Our results show that there is an interesting interplay between source distortion (i.e., distortion for the probability vector measured via f-divergence) and the subsequent channel encoding/decoding parameters; and indicate that a joint design of these parameters is crucial to navigate the latency-distortion tradeoff. We study the impact of changing different parameters (e.g. number of classes, SNR, source distortion) on the latency-distortion tradeoff and perform experiments on AWGN and fading channels. Our results indicate that sparse lattice-based quantization is the most effective at minimizing latency across various regimes and for sparse, high-dimensional probability vectors (i.e., high number of classes).
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation
Authors:
Harshit Kumar,
Sudarshan Sharma,
Biswadeep Chakraborty,
Saibal Mukhopadhyay
Abstract:
This study introduces RT-HMD, a Hardware-based Malware Detector (HMD) for mobile devices, that refines malware representation in segmented time-series through a Multiple Instance Learning (MIL) approach. We address the mislabeling issue in real-time HMDs, where benign segments in malware time-series incorrectly inherit malware labels, leading to increased false positives. Utilizing the proposed Ma…
▽ More
This study introduces RT-HMD, a Hardware-based Malware Detector (HMD) for mobile devices, that refines malware representation in segmented time-series through a Multiple Instance Learning (MIL) approach. We address the mislabeling issue in real-time HMDs, where benign segments in malware time-series incorrectly inherit malware labels, leading to increased false positives. Utilizing the proposed Malicious Discriminative Score within the MIL framework, RT-HMD effectively identifies localized malware behaviors, thereby improving the predictive accuracy. Empirical analysis, using a hardware telemetry dataset collected from a mobile platform across 723 benign and 1033 malware samples, shows a 5% precision boost while maintaining recall, outperforming baselines affected by mislabeled benign segments.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Hydrodynamical simulations favor a pure deflagration origin of the near-Chandrasekhar mass supernova remnant 3C 397
Authors:
Vrutant Mehta,
Jack Sullivan,
Robert Fisher,
Yuken Ohshiro,
Hiroya Yamaguchi,
Khanak Bhargava,
Sudarshan Neopane
Abstract:
Suzaku X-ray observations of the Type Ia supernova remnant (SNR) 3C 397 discovered exceptionally high mass ratios of Mn/Fe, Ni/Fe, and Cr/Fe, consistent with a near $M_{\rm Ch}$ progenitor white dwarf (WD). The Suzaku observations have established 3C 397 as our best candidate for a near-$M_{\rm Ch}$ SNR Ia, and opened the way to address additional outstanding questions about the origin and explosi…
▽ More
Suzaku X-ray observations of the Type Ia supernova remnant (SNR) 3C 397 discovered exceptionally high mass ratios of Mn/Fe, Ni/Fe, and Cr/Fe, consistent with a near $M_{\rm Ch}$ progenitor white dwarf (WD). The Suzaku observations have established 3C 397 as our best candidate for a near-$M_{\rm Ch}$ SNR Ia, and opened the way to address additional outstanding questions about the origin and explosion mechanism of these transients. In particular, subsequent XMM-Newton observations revealed an unusually clumpy distribution of iron group elemental (IGE) abundances within the ejecta of 3C 397. In this paper, we undertake a suite of two dimensional hydrodynamical models, varying both the explosion mechanism -- either deflagration-to-detonation (DDT), or pure deflagration -- WD progenitors, and WD progenitor metallicity, and analyze their detailed nucleosynthetic abundances and associated clum**. We find that pure deflagrations naturally give rise to clumpy distributions of neutronized species concentrated towards the outer limb of the remnant, and confirm DDTs have smoothly structured ejecta with a central concentration of neutronization. Our findings indicate that 3C 397 was most likely a pure deflagration of a high central density WD. We discuss a range of implications of these findings for the broader SN Ia progenitor problem.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
iSeg: Interactive 3D Segmentation via Interactive Attention
Authors:
Itai Lang,
Fei Xu,
Dale Decatur,
Sudarshan Babu,
Rana Hanocka
Abstract:
We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same se…
▽ More
We present iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text. However, text may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same semantic region may not be visible together from any 2D view. Thus, we design a segmentation method conditioned on fine user clicks, which operates entirely in 3D. Our system accepts user clicks directly on the shape's surface, indicating the inclusion or exclusion of regions from the desired shape partition. To accommodate various click settings, we propose a novel interactive attention module capable of processing different numbers and types of clicks, enabling the training of a single unified interactive segmentation model. We apply iSeg to a myriad of shapes from different domains, demonstrating its versatility and faithfulness to the user's specifications. Our project page is at https://threedle.github.io/iSeg/.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Evidence for conventional superconductivity in Bi$_2$PdPt and prediction of topological superconductivity in disorder-free $γ$-BiPd
Authors:
S. Sharma,
A. D. S. Richards,
Sajilesh K. P.,
A. Kataria,
B. S. Agboola,
M. Pula,
J. Gautreau,
A. Ghara,
D. Singh,
S. Marik,
S. R. Dunsiger,
M. J. Lagos,
A. Kanigel,
E. S. Sørensen,
R. P. Singh,
G. M. Luke
Abstract:
We present comprehensive investigations into the structural, superconducting, and topological properties of Bi$_2$PdPt. Magnetization and heat capacity measurements performed on polycrystalline Bi$_2$PdPt demonstrate a superconducting transition at $\approx$ 0.8 K. Moreover, muon spin relaxation/rotation ($μ$SR) measurements present evidence for a time reversal symmetry preserving, isotropically g…
▽ More
We present comprehensive investigations into the structural, superconducting, and topological properties of Bi$_2$PdPt. Magnetization and heat capacity measurements performed on polycrystalline Bi$_2$PdPt demonstrate a superconducting transition at $\approx$ 0.8 K. Moreover, muon spin relaxation/rotation ($μ$SR) measurements present evidence for a time reversal symmetry preserving, isotropically gapped superconducting state in Bi$_2$PdPt. We have also performed density-functional theory (DFT) calculations on Bi$_2$PdPt alongside the more general isostructural systems, BiPd$_{x}$Pt$_{1-x}$, of which Bi$_2$PdPt and $γ$-BiPd are special cases for $x=0.5$ and $x=1$ respectively. We have calculated the $Z_2$ topological index from our DFT calculations for a range of substitution fractions, $x$, between $x=0$ and $x=1$ characterizing the topology of the band structure. We find a non-trivial topological state when $x>0.75$ and a trivial topological state when $x<0.75$. Therefore our results indicate that BiPd$_{x}$Pt$_{1-x}$ could be a topological superconductor for $x>0.75$.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Leveraging Large Language Models to Extract Information on Substance Use Disorder Severity from Clinical Notes: A Zero-shot Learning Approach
Authors:
Maria Mahbub,
Gregory M. Dams,
Sudarshan Srinivasan,
Caitlin Rizy,
Ioana Danciu,
Jodie Trafton,
Kathryn Knight
Abstract:
Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10),…
▽ More
Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by American insurance providers, like the International Classification of Diseases (ICD-10), lack granularity for certain diagnoses, but clinicians will add this granularity (as that found within the Diagnostic and Statistical Manual of Mental Disorders classification or DSM-5) as supplemental unstructured text in clinical notes. Traditional natural language processing (NLP) methods face limitations in accurately parsing such diverse clinical language. Large Language Models (LLMs) offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of LLMs for extracting severity-related information for various SUD diagnoses from clinical notes. We propose a workflow employing zero-shot learning of LLMs with carefully crafted prompts and post-processing techniques. Through experimentation with Flan-T5, an open-source LLM, we demonstrate its superior recall compared to the rule-based approach. Focusing on 11 categories of SUD diagnoses, we show the effectiveness of LLMs in extracting severity information, contributing to improved risk assessment and treatment planning for SUD patients.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Dichotomous Dynamics of Magnetic Monopole Fluids
Authors:
Chun-Chih Hsu,
Hiroto Takahashi,
Fabian Jerzembeck,
Jahnatta Dasini,
Chaia Carroll,
Ritika Dusad,
Jonathan Ward,
Catherine Dawson,
Sudarshan Sharma,
Graeme Luke,
Stephen J. Blundell,
Claudio Castelnovo,
Jonathan N. Hallén,
Roderich Moessner,
J. C. Séamus Davis
Abstract:
A recent advance in the study of emergent magnetic monopoles was the discovery that monopole motion is restricted to dynamical fractal trajectories (J. Hallén et al, Science 378, 1218 (2022)) thus explaining the characteristics of magnetic monopole noise spectra (Dusad, R. et al. Nature 571, 234 (2019); Samarakoon, A. M. et al. Proc. Natl. Acad. Sci. 119, e2117453119 (2022)). Here we apply this ne…
▽ More
A recent advance in the study of emergent magnetic monopoles was the discovery that monopole motion is restricted to dynamical fractal trajectories (J. Hallén et al, Science 378, 1218 (2022)) thus explaining the characteristics of magnetic monopole noise spectra (Dusad, R. et al. Nature 571, 234 (2019); Samarakoon, A. M. et al. Proc. Natl. Acad. Sci. 119, e2117453119 (2022)). Here we apply this new theory to explore the dynamics of field-driven monopole currents, finding them comprised of two quite distinct transport processes: initially swift fractal rearrangements of local monopole configurations followed by conventional monopole diffusion. This theory also predicts a characteristic frequency dependence of the dissipative loss-angle for AC-field-driven currents. To explore these novel perspectives on monopole transport, we introduce simultaneous monopole current control and measurement techniques using SQUID-based monopole current sensors. For the canonical material Dy2Ti2O7, we measure $Φ(t)$, the time-dependence of magnetic flux threading the sample when a net monopole current $J(t) = \dotΦ(t)/μ_0$ is generated by applying an external magnetic field $B_0(t)$. These experiments find a sharp dichotomy of monopole currents, separated by their distinct relaxation time-constants before and after $t \approx 600 μs$ from monopole current initiation. Application of sinusoidal magnetic fields $B_0(t) = Bcos(ωt)$ generates oscillating monopole currents whose loss angle $θ(f)$ exhibits a characteristic transition at frequency $f \approx 1.8$ kHz over the same temperature range. Finally, the magnetic noise power is also dichotomic, diminishing sharply after $t \approx 600 μs$. This complex phenomenology represents a new form of heterogeneous dynamics generated by the interplay of fractionalization and local spin configurational symmetry.
△ Less
Submitted 9 April, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Low-Distortion Clustering with Ordinal and Limited Cardinal Information
Authors:
Jakob Burkhardt,
Ioannis Caragiannis,
Karl Fehrs,
Matteo Russo,
Chris Schwiegelshohn,
Sudarshan Shyam
Abstract:
Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only…
▽ More
Motivated by recent work in computational social choice, we extend the metric distortion framework to clustering problems. Given a set of $n$ agents located in an underlying metric space, our goal is to partition them into $k$ clusters, optimizing some social cost objective. The metric space is defined by a distance function $d$ between the agent locations. Information about $d$ is available only implicitly via $n$ rankings, through which each agent ranks all other agents in terms of their distance from her. Still, we would like to evaluate clustering algorithms in terms of social cost objectives that are defined using $d$. This is done using the notion of distortion, which measures how far from optimality a clustering can be, taking into account all underlying metrics that are consistent with the ordinal information available. Unfortunately, the most important clustering objectives do not admit algorithms with finite distortion. To sidestep this disappointing fact, we follow two alternative approaches: We first explore whether resource augmentation can be beneficial. We consider algorithms that use more than $k$ clusters but compare their social cost to that of the optimal $k$-clusterings. We show that using exponentially (in terms of $k$) many clusters, we can get low (constant or logarithmic) distortion for the $k$-center and $k$-median objectives. Interestingly, such an exponential blowup is shown to be necessary. More importantly, we explore whether limited cardinal information can be used to obtain better results. Somewhat surprisingly, for $k$-median and $k$-center, we show that a number of queries that is polynomial in $k$ and only logarithmic in $n$ (i.e., only sublinear in the number of agents for the most relevant scenarios in practice) is enough to get constant distortion.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird's-Eye-View
Authors:
Arindam Das,
Sudarshan Paul,
Niko Scholz,
Akhilesh Kumar Malviya,
Ganesh Sistu,
Ujjwal Bhattacharya,
Ciarán Eising
Abstract:
Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when s…
▽ More
Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when subjected to intense sun glare. Conversely, cost-effective sensors like ultrasonic sensors remain largely unaffected under these conditions. Therefore, we present, to our knowledge, the first end-to-end multimodal fusion model tailored for efficient obstacle perception in a bird's-eye-view (BEV) perspective, utilizing fisheye cameras and ultrasonic sensors. Initially, ResNeXt-50 is employed as a set of unimodal encoders to extract features specific to each modality. Subsequently, the feature space associated with the visible spectrum undergoes transformation into BEV. The fusion of these two modalities is facilitated via concatenation. At the same time, the ultrasonic spectrum-based unimodal feature maps pass through content-aware dilated convolution, applied to mitigate the sensor misalignment between two sensors in the fused feature space. Finally, the fused features are utilized by a two-stage semantic occupancy decoder to generate grid-wise predictions for precise obstacle perception. We conduct a systematic investigation to determine the optimal strategy for multimodal fusion of both sensors. We provide insights into our dataset creation procedures, annotation guidelines, and perform a thorough data analysis to ensure adequate coverage of all scenarios. When applied to our dataset, the experimental results underscore the robustness and effectiveness of our proposed multimodal fusion approach.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Dynamic Q&A of Clinical Documents with Large Language Models
Authors:
Ran Elgedawy,
Ioana Danciu,
Maria Mahbub,
Sudarshan Srinivasan
Abstract:
Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural langua…
▽ More
Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making.
△ Less
Submitted 2 July, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
PHOENIX: Open-Source Language Adaption for Direct Preference Optimization
Authors:
Matthias Uhlig,
Sigurd Schacht,
Sudarshan Kamath Barkur
Abstract:
Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other lang…
▽ More
Large language models have gained immense importance in recent years and have demonstrated outstanding results in solving various tasks. However, despite these achievements, many questions remain unanswered in the context of large language models. Besides the optimal use of the models for inference and the alignment of the results to the desired specifications, the transfer of models to other languages is still an underdeveloped area of research. The recent publication of models such as Llama-2 and Zephyr has provided new insights into architectural improvements and the use of human feedback. However, insights into adapting these techniques to other languages remain scarce. In this paper, we build on latest improvements and apply the Direct Preference Optimization(DPO) approach to the German language. The model is available at https://huggingface.co/DRXD1000/Phoenix.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks
Authors:
Sudarshan Devkota,
Sumanta Pattanaik
Abstract:
In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a map** between spatial coordinates…
▽ More
In this paper, we propose an efficient approach for the compression and representation of volumetric data utilizing coordinate-based networks and multi-resolution hash encoding. Efficient compression of volumetric data is crucial for various applications, such as medical imaging and scientific simulations. Our approach enables effective compression by learning a map** between spatial coordinates and intensity values. We compare different encoding schemes and demonstrate the superiority of multi-resolution hash encoding in terms of compression quality and training efficiency. Furthermore, we leverage optimization-based meta-learning, specifically using the Reptile algorithm, to learn weight initialization for neural representations tailored to volumetric data, enabling faster convergence during optimization. Additionally, we compare our approach with state-of-the-art methods to showcase improved image quality and compression ratios. These findings highlight the potential of coordinate-based networks and multi-resolution hash encoding for an efficient and accurate representation of volumetric data, paving the way for advancements in large-scale data visualization and other applications.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Interpretable Underwater Diver Gesture Recognition
Authors:
Sudeep Mangalvedhekar,
Shreyas Nahar,
Sudarshan Maskare,
Kaushal Mahajan,
Dr. Anant Bagade
Abstract:
In recent years, usage and applications of Autonomous Underwater Vehicles has grown rapidly. Interaction of divers with the AUVs remains an integral part of the usage of AUVs for various applications and makes building robust and efficient underwater gesture recognition systems extremely important. In this paper, we propose an Underwater Gesture Recognition system trained on the Cognitive Autonomo…
▽ More
In recent years, usage and applications of Autonomous Underwater Vehicles has grown rapidly. Interaction of divers with the AUVs remains an integral part of the usage of AUVs for various applications and makes building robust and efficient underwater gesture recognition systems extremely important. In this paper, we propose an Underwater Gesture Recognition system trained on the Cognitive Autonomous Diving Buddy Underwater gesture dataset using deep learning that achieves 98.01\% accuracy on the dataset, which to the best of our knowledge is the best performance achieved on this dataset at the time of writing this paper. We also improve the Gesture Recognition System Interpretability by using XAI techniques to visualize the model's predictions.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Synergistic Signals: Exploiting Co-Engagement and Semantic Links via Graph Neural Networks
Authors:
Zijie Huang,
Baolin Li,
Hafez Asgharzadeh,
Anne Cocos,
Lingyi Liu,
Evan Cox,
Colby Wise,
Sudarshan Lamkhede
Abstract:
Given a set of candidate entities (e.g. movie titles), the ability to identify similar entities is a core capability of many recommender systems. Most often this is achieved by collaborative filtering approaches, i.e. if users co-engage with a pair of entities frequently enough, the embeddings should be similar. However, relying on co-engagement data alone can result in lower-quality embeddings fo…
▽ More
Given a set of candidate entities (e.g. movie titles), the ability to identify similar entities is a core capability of many recommender systems. Most often this is achieved by collaborative filtering approaches, i.e. if users co-engage with a pair of entities frequently enough, the embeddings should be similar. However, relying on co-engagement data alone can result in lower-quality embeddings for new and unpopular entities. We study this problem in the context recommender systems at Netflix. We observe that there is abundant semantic information such as genre, content maturity level, themes, etc. that complements co-engagement signals and provides interpretability in similarity models. To learn entity similarities from both data sources holistically, we propose a novel graph-based approach called SemanticGNN. SemanticGNN models entities, semantic concepts, collaborative edges, and semantic edges within a large-scale knowledge graph and conducts representation learning over it. Our key technical contributions are twofold: (1) we develop a novel relation-aware attention graph neural network (GNN) to handle the imbalanced distribution of relation types in our graph; (2) to handle web-scale graph data that has millions of nodes and billions of edges, we develop a novel distributed graph training paradigm. The proposed model is successfully deployed within Netflix and empirical experiments indicate it yields up to 35% improvement in performance on similarity judgment tasks.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
GreenFPGA: Evaluating FPGAs as Environmentally Sustainable Computing Solutions
Authors:
Chetan Choppali Sudarshan,
Aman Arora,
Vidya A. Chhabria
Abstract:
Growing global concerns about climate change highlight the need for environmentally sustainable computing. The ecological impact of computing, including operational and embodied, is a key consideration. Field Programmable Gate Arrays (FPGAs) stand out as promising sustainable computing platforms due to their reconfigurability across various applications. This paper introduces GreenFPGA, a tool est…
▽ More
Growing global concerns about climate change highlight the need for environmentally sustainable computing. The ecological impact of computing, including operational and embodied, is a key consideration. Field Programmable Gate Arrays (FPGAs) stand out as promising sustainable computing platforms due to their reconfigurability across various applications. This paper introduces GreenFPGA, a tool estimating the total carbon footprint (CFP) of FPGAs over their lifespan, considering design, manufacturing, reconfigurability (reuse), operation, disposal, and recycling. Using GreenFPGA, the paper evaluates scenarios where the ecological benefits of FPGA reconfigurability outweigh operational and embodied carbon costs, positioning FPGAs as a environmentally sustainable choice for hardware acceleration compared to Application-Specific Integrated Circuits (ASICs). Experimental results show that FPGAs have lower CFP than ASICs, particularly for multiple distinct, low-volume applications, or short application lifespans.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Characterizing the efficacy of methods to subtract terrestrial transient noise near gravitational wave events and the effects on parameter estimation
Authors:
Sudarshan Ghonge,
Joshua Brandt,
J. M. Sullivan,
Margaret Millhouse,
Katerina Chatziioannou,
James A. Clark,
Tyson Littenberg,
Neil Cornish,
Sophie Hourihane,
Laura Cadonati
Abstract:
We investigate the impact of transient noise artifacts, or {\it glitches}, on gravitational wave inference, and the efficacy of data cleaning procedures in recovering unbiased source properties. Due to their time-frequency morphology, broadband glitches demonstrate moderate to significant biasing of posterior distributions away from true values. In contrast, narrowband glitches have negligible bia…
▽ More
We investigate the impact of transient noise artifacts, or {\it glitches}, on gravitational wave inference, and the efficacy of data cleaning procedures in recovering unbiased source properties. Due to their time-frequency morphology, broadband glitches demonstrate moderate to significant biasing of posterior distributions away from true values. In contrast, narrowband glitches have negligible biasing effects owing to distinct signal and glitch morphologies. We inject simulated binary black hole signals into data containing three common glitch types from past LIGO-Virgo observing runs, and reconstruct both signal and glitch waveforms using {\tt BayesWave}, a wavelet-based Bayesian analysis. We apply the standard LIGO-Virgo-KAGRA deglitching procedure to the detector data - we subtract the glitch waveform estimated by the joint {\tt BayesWave} inference before performing parameter estimation with detailed compact binary waveform models. We find that this deglitching effectively mitigates bias from broadband glitches, with posterior peaks aligning with true values post deglitching. This provides a baseline validation of existing techniques, while demonstrating waveform reconstruction improvements to the Bayesian algorithm for robust astrophysical characterization in glitch-prone detector data.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Authors:
Sudarshan Babu,
Richard Liu,
Avery Zhou,
Michael Maire,
Greg Shakhnarovich,
Rana Hanocka
Abstract:
We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth map** from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hyperne…
▽ More
We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth map** from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.
△ Less
Submitted 13 June, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
A well-balanced second-order finite volume approximation for a coupled system of granular flow
Authors:
Aekta Aggarwal,
Veerappa Gowda G. D.,
Sudarshan Kumar K
Abstract:
A well-balanced second-order finite volume scheme is proposed and analyzed for a 2 X 2 system of non-linear partial differential equations which describes the dynamics of growing sandpiles created by a vertical source on a flat, bounded rectangular table in multiple dimensions. To derive a second-order scheme, we combine a MUSCL type spatial reconstruction with strong stability preserving Runge-Ku…
▽ More
A well-balanced second-order finite volume scheme is proposed and analyzed for a 2 X 2 system of non-linear partial differential equations which describes the dynamics of growing sandpiles created by a vertical source on a flat, bounded rectangular table in multiple dimensions. To derive a second-order scheme, we combine a MUSCL type spatial reconstruction with strong stability preserving Runge-Kutta time step** method. The resulting scheme is ensured to be well-balanced through a modified limiting approach that allows the scheme to reduce to well-balanced first-order scheme near the steady state while maintaining the second-order accuracy away from it. The well-balanced property of the scheme is proven analytically in one dimension and demonstrated numerically in two dimensions. Additionally, numerical experiments reveal that the second-order scheme reduces finite time oscillations, takes fewer time iterations for achieving the steady state and gives sharper resolutions of the physical structure of the sandpile, as compared to the existing first-order schemes of the literature.
△ Less
Submitted 2 January, 2024; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring
Authors:
Ankitha Sudarshan,
Vinay Samuel,
Parth Patwa,
Ibtihel Amara,
Aman Chadha
Abstract:
Automatic Speech Recognition (ASR) has witnessed a profound research interest. Recent breakthroughs have given ASR systems different prospects such as faithfully transcribing spoken language, which is a pivotal advancement in building conversational agents. However, there is still an imminent challenge of accurately discerning context-dependent words and phrases. In this work, we propose a novel a…
▽ More
Automatic Speech Recognition (ASR) has witnessed a profound research interest. Recent breakthroughs have given ASR systems different prospects such as faithfully transcribing spoken language, which is a pivotal advancement in building conversational agents. However, there is still an imminent challenge of accurately discerning context-dependent words and phrases. In this work, we propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing leveraging the power of deep learning models in accurately delivering spot-on transcriptions across a wide variety of vocabularies and speaking styles. Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models integrating both language and acoustic modeling for better accuracy. We infused our network with the use of a transformer-based model to properly rescore the word lattice achieving remarkable capabilities with a palpable reduction in Word Error Rate (WER). We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
△ Less
Submitted 3 March, 2024; v1 submitted 14 October, 2023;
originally announced October 2023.
-
Sculpting Efficiency: Pruning Medical Imaging Models for On-Device Inference
Authors:
Sudarshan Sreeram,
Bernhard Kainz
Abstract:
Leveraging ML advancements to augment healthcare systems can improve patient outcomes. Yet, uninformed engineering decisions in early-stage research inadvertently hinder the feasibility of such solutions for high-throughput, on-device inference, particularly in settings involving legacy hardware and multi-modal gigapixel images. Through a preliminary case study concerning segmentation in cardiolog…
▽ More
Leveraging ML advancements to augment healthcare systems can improve patient outcomes. Yet, uninformed engineering decisions in early-stage research inadvertently hinder the feasibility of such solutions for high-throughput, on-device inference, particularly in settings involving legacy hardware and multi-modal gigapixel images. Through a preliminary case study concerning segmentation in cardiology, we highlight the excess operational complexity in a suboptimally configured ML model from prior work and demonstrate that it can be sculpted away using pruning to meet deployment criteria. Our results show a compression rate of 1148x with minimal loss in quality (~4%) and, at higher rates, achieve faster inference on a CPU than the GPU baseline, stressing the need to consider task complexity and architectural details when using off-the-shelf models. With this, we consider avenues for future research in streamlining workflows for clinical researchers to develop models quicker and better suited for real-world use.
△ Less
Submitted 1 November, 2023; v1 submitted 10 September, 2023;
originally announced September 2023.
-
Distance Preserving Machine Learning for Uncertainty Aware Accelerator Capacitance Predictions
Authors:
Steven Goldenberg,
Malachi Schram,
Kishansingh Rajput,
Thomas Britton,
Chris Pappas,
Dan Lu,
Jared Walden,
Majdi I. Radaideh,
Sarah Cousineau,
Sudarshan Harave
Abstract:
Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techni…
▽ More
Providing accurate uncertainty estimations is essential for producing reliable machine learning models, especially in safety-critical applications such as accelerator systems. Gaussian process models are generally regarded as the gold standard method for this task, but they can struggle with large, high-dimensional datasets. Combining deep neural networks with Gaussian process approximation techniques have shown promising results, but dimensionality reduction through standard deep neural network layers is not guaranteed to maintain the distance information necessary for Gaussian process models. We build on previous work by comparing the use of the singular value decomposition against a spectral-normalized dense layer as a feature extractor for a deep neural Gaussian process approximation model and apply it to a capacitance prediction problem for the High Voltage Converter Modulators in the Oak Ridge Spallation Neutron Source. Our model shows improved distance preservation and predicts in-distribution capacitance values with less than 1% error.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Fragile superconductivity in a Dirac metal
Authors:
Chris J. Lygouras,
Junyi Zhang,
Jonah Gautreau,
Mathew Pula,
Sudarshan Sharma,
Shiyuan Gao,
Tanya Berry,
Thomas Halloran,
Peter Orban,
Gael Grissonnanche,
Juan R. Chamorro,
Kagetora Mikuri,
Dilip K. Bhoi,
Maxime A. Siegler,
Kenneth K. Livi,
Yoshiya Uwatoko,
Satoru Nakatsuji,
B. J. Ramshaw,
Yi Li,
Graeme M. Luke,
Collin L. Broholm,
Tyrel M. McQueen
Abstract:
Studying superconductivity in Dirac semimetals is an important step in understanding quantum matter with topologically non-trivial order parameters. We report on the properties of the superconducting phase in single crystals of the Dirac material LaCuSb2 prepared by the self-flux method. We find that chemical and hydrostatic pressure drastically suppress the superconducting transition. Furthermore…
▽ More
Studying superconductivity in Dirac semimetals is an important step in understanding quantum matter with topologically non-trivial order parameters. We report on the properties of the superconducting phase in single crystals of the Dirac material LaCuSb2 prepared by the self-flux method. We find that chemical and hydrostatic pressure drastically suppress the superconducting transition. Furthermore, due to large Fermi surface anisotropy, magnetization and muon spin relaxation measurements reveal Type-II superconductivity for applied magnetic fields along the $a$-axis, and Type-I superconductivity for fields along the $c$-axis. Specific heat confirms the bulk nature of the transition, and its deviation from single-gap $s$-wave BCS theory suggests multigap superconductivity. Our tight-binding model points to an anisotropic gap function arising from the spin-orbital texture near the Dirac nodes, providing an explanation for the appearance of an anomaly in specific heat well below $T_c$. Given the existence of superconductivity in a material harboring Dirac fermions, LaCuSb2 proves an interesting material candidate in the search for topological superconductivity.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Neural Network Pruning for Real-time Polyp Segmentation
Authors:
Suman Sapkota,
Pranav Poudel,
Sudarshan Regmi,
Bibek Panthi,
Binod Bhattarai
Abstract:
Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising…
▽ More
Computer-assisted treatment has emerged as a viable application of medical imaging, owing to the efficacy of deep learning models. Real-time inference speed remains a key requirement for such applications to help medical personnel. Even though there generally exists a trade-off between performance and model size, impressive efforts have been made to retain near-original performance by compromising model size. Neural network pruning has emerged as an exciting area that aims to eliminate redundant parameters to make the inference faster. In this study, we show an application of neural network pruning in polyp segmentation. We compute the importance score of convolutional filters and remove the filters having the least scores, which to some value of pruning does not degrade the performance. For computing the importance score, we use the Taylor First Order (TaylorFO) approximation of the change in network output for the removal of certain filters. Specifically, we employ a gradient-normalized backpropagation for the computation of the importance score. Through experiments in the polyp datasets, we validate that our approach can significantly reduce the parameter count and FLOPs retaining similar performance.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
ECO-CHIP: Estimation of Carbon Footprint of Chiplet-based Architectures for Sustainable VLSI
Authors:
Chetan Choppali Sudarshan,
Nikhil Matkar,
Sarma Vrudhula,
Sachin S. Sapatnekar,
Vidya A. Chhabria
Abstract:
Decades of progress in energy-efficient and low-power design have successfully reduced the operational carbon footprint in the semiconductor industry. However, this has led to an increase in embodied emissions, encompassing carbon emissions arising from design, manufacturing, packaging, and other infrastructural activities. While existing research has developed tools to analyze embodied carbon at…
▽ More
Decades of progress in energy-efficient and low-power design have successfully reduced the operational carbon footprint in the semiconductor industry. However, this has led to an increase in embodied emissions, encompassing carbon emissions arising from design, manufacturing, packaging, and other infrastructural activities. While existing research has developed tools to analyze embodied carbon at the computer architecture level for traditional monolithic systems, these tools do not apply to near-mainstream heterogeneous integration (HI) technologies. HI systems offer significant potential for sustainable computing by minimizing carbon emissions through two key strategies: ``reducing" computation by reusing pre-designed chiplet IP blocks and adopting hierarchical approaches to system design. The reuse of chiplets across multiple designs, even spanning multiple generations of integrated circuits (ICs), can substantially reduce embodied carbon emissions throughout the operational lifespan. This paper introduces a carbon analysis tool specifically designed to assess the potential of HI systems in facilitating greener VLSI system design and manufacturing approaches. The tool takes into account scaling, chiplet and packaging yields, design complexity, and even carbon overheads associated with advanced packaging techniques employed in heterogeneous systems. Experimental results demonstrate that HI can achieve a reduction of embodied carbon emissions up to 70\% compared to traditional large monolithic systems. These findings suggest that HI can pave the way for sustainable computing practices, contributing to a more environmentally conscious semiconductor industry.
△ Less
Submitted 14 February, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Topological characterization of special edge modes from the winding of relative phase
Authors:
Sudarshan Saha,
Tanay Nag,
Saptarshi Mandal
Abstract:
The symmetry-constrained topological invariant fails to explain the emergence of the special edge modes when system does not preserve discrete symmetries. The inversion or chiral symmetry broken SSH model is an example of one such system where one-sided edge state with finite energy appears at one end of the open chain. To investigate whether this special edge mode is of topological origin or not,…
▽ More
The symmetry-constrained topological invariant fails to explain the emergence of the special edge modes when system does not preserve discrete symmetries. The inversion or chiral symmetry broken SSH model is an example of one such system where one-sided edge state with finite energy appears at one end of the open chain. To investigate whether this special edge mode is of topological origin or not, we introduce a concept of relative phase between the components of a two-component spinor and define a winding number by the change of this relative phase over the one-dimensional Brillouin zone. The relative phase winds non-trivially (trivially) in accord with the presence (absence) of the one-sided edge mode inferring the bulk boundary correspondence. We extend this analysis to a two dimensional case where we characterize the non-trivial phase, hosting gapped one-sided edge mode, by the winding in relative phase only along a certain axis in the Brillouin zone. We demonstrate all the above findings from a generic parametric representation while topology is essentially determined by whether the underlying lower-dimensional projection includes or excludes the origin. Our study thus reveals a new paradigm of symmetry broken topological phases for future studies.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Scheduling of Intermittent Query Processing
Authors:
Saranya C,
Sudarshan S
Abstract:
Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using stream processing engines can be very inefficient since there is often a significant overhead per tuple or micro-batch. The cost of computation can be significantl…
▽ More
Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using stream processing engines can be very inefficient since there is often a significant overhead per tuple or micro-batch. The cost of computation can be significantly reduced by using the wider window available for computation. In this work, we present scheduling schemes where the overhead cost is minimized while meeting the query deadline constraints. For such queries, since the result is needed only at the deadline, tuples can be processed in larger batches, instead of using micro-batches. We present scheduling schemes for single and multi query scenarios. The proposed scheduling algorithms have been implemented as a Custom Query Scheduler, on top of Apache Spark. Our performance study with TPC-H data, under single and multi query modes, shows orders of magnitude improvement as compared to naively using Spark streaming.
△ Less
Submitted 21 April, 2024; v1 submitted 11 June, 2023;
originally announced June 2023.
-
Deriving interaction vertices in higher derivative theories
Authors:
Sudarshan Ananth,
Nipun Bhave,
Chetan Pandey,
Saurabh Pant
Abstract:
We derive cubic interaction vertices for a class of higher-derivative theories involving three arbitrary integer spin fields. This derivation uses the requirement of closure of the Poincarè algebra in four-dimensional flat spacetime. We find two varieties of permitted structures at the cubic level and eliminate one variety, which is proportional to the equations of motion, using suitable field red…
▽ More
We derive cubic interaction vertices for a class of higher-derivative theories involving three arbitrary integer spin fields. This derivation uses the requirement of closure of the Poincarè algebra in four-dimensional flat spacetime. We find two varieties of permitted structures at the cubic level and eliminate one variety, which is proportional to the equations of motion, using suitable field redefinitions. We then consider soft theorems for field theories with higher-derivative interactions and construct amplitudes in these theories using the inverse-soft approach.
△ Less
Submitted 25 October, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection
Authors:
Sudarshan Regmi,
Bibek Panthi,
Sakar Dotel,
Prashnna K. Gyawali,
Danail Stoyanov,
Binod Bhattarai
Abstract:
Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorpora…
▽ More
Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee substantial improvement in OOD detection performance. In this work, we introduce T2FNorm, a novel approach to transforming features to hyperspherical space during training, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demonstrates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods.
△ Less
Submitted 8 June, 2023; v1 submitted 28 May, 2023;
originally announced May 2023.
-
How to verify the precision of density-functional-theory implementations via reproducible and universal workflows
Authors:
Emanuele Bosoni,
Louis Beal,
Marnik Bercx,
Peter Blaha,
Stefan Blügel,
Jens Bröder,
Martin Callsen,
Stefaan Cottenier,
Augustin Degomme,
Vladimir Dikan,
Kristjan Eimre,
Espen Flage-Larsen,
Marco Fornari,
Alberto Garcia,
Luigi Genovese,
Matteo Giantomassi,
Sebastiaan P. Huber,
Henning Janssen,
Georg Kastlunger,
Matthias Krack,
Georg Kresse,
Thomas D. Kühne,
Kurt Lejaeghere,
Georg K. H. Madsen,
Martijn Marsman
, et al. (20 additional authors not shown)
Abstract:
In the past decades many density-functional theory methods and codes adopting periodic boundary conditions have been developed and are now extensively used in condensed matter physics and materials science research. Only in 2016, however, their precision (i.e., to which extent properties computed with different codes agree among each other) was systematically assessed on elemental crystals: a firs…
▽ More
In the past decades many density-functional theory methods and codes adopting periodic boundary conditions have been developed and are now extensively used in condensed matter physics and materials science research. Only in 2016, however, their precision (i.e., to which extent properties computed with different codes agree among each other) was systematically assessed on elemental crystals: a first crucial step to evaluate the reliability of such computations. We discuss here general recommendations for verification studies aiming at further testing precision and transferability of density-functional-theory computational approaches and codes. We illustrate such recommendations using a greatly expanded protocol covering the whole periodic table from Z=1 to 96 and characterizing 10 prototypical cubic compounds for each element: 4 unaries and 6 oxides, spanning a wide range of coordination numbers and oxidation states. The primary outcome is a reference dataset of 960 equations of state cross-checked between two all-electron codes, then used to verify and improve nine pseudopotential-based approaches. Such effort is facilitated by deploying AiiDA common workflows that perform automatic input parameter selection, provide identical input/output interfaces across codes, and ensure full reproducibility. Finally, we discuss the extent to which the current results for total energies can be reused for different goals (e.g., obtaining formation energies).
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Admissibility preserving subcell limiter for Lax-Wendroff flux reconstruction
Authors:
Arpit Babbar,
Sudarshan Kumar Kenettinkara,
Praveen Chandrashekar
Abstract:
Lax-Wendroff Flux Reconstruction (LWFR) is a single-stage, high order, quadrature free method for solving hyperbolic conservation laws. We develop a subcell based limiter by blending LWFR with a lower order scheme, either first order finite volume or MUSCL-Hancock scheme. While the blending with a lower order scheme helps to control oscillations, it may not guarantee admissibility of discrete solu…
▽ More
Lax-Wendroff Flux Reconstruction (LWFR) is a single-stage, high order, quadrature free method for solving hyperbolic conservation laws. We develop a subcell based limiter by blending LWFR with a lower order scheme, either first order finite volume or MUSCL-Hancock scheme. While the blending with a lower order scheme helps to control oscillations, it may not guarantee admissibility of discrete solution, e.g., positivity property of quantities like density and pressure. By exploiting the subcell structure and admissibility of lower order schemes, we devise a strategy to ensure that the blended scheme is admissibility preserving for the mean values and then use a scaling limiter to obtain admissibility of the polynomial solution. For MUSCL-Hancock scheme on non-cell-centered subcells, we develop a slope limiter, time step restrictions and suitable blending of higher order fluxes, that ensures admissibility of lower order updates and hence that of the cell averages. By using the MUSCL-Hancock scheme on subcells and Gauss-Legendre points in flux reconstruction, we improve small-scale resolution compared to the subcell-based RKDG blending scheme with first order finite volume method and Gauss-Legendre-Lobatto points. We demonstrate the performance of our scheme on compressible Euler's equations, showcasing its ability to handle shocks and preserve small-scale structures.
△ Less
Submitted 17 January, 2024; v1 submitted 18 May, 2023;
originally announced May 2023.
-
Generalization Bounds for Neural Belief Propagation Decoders
Authors:
Sudarshan Adiga,
Xin Xiao,
Ravi Tandon,
Bane Vasic,
Tamal Bose
Abstract:
Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms…
▽ More
Machine learning based approaches are being increasingly used for designing decoders for next generation communication systems. One widely used framework is neural belief propagation (NBP), which unfolds the belief propagation (BP) iterations into a deep neural network and the parameters are trained in a data-driven manner. NBP decoders have been shown to improve upon classical decoding algorithms. In this paper, we investigate the generalization capabilities of NBP decoders. Specifically, the generalization gap of a decoder is the difference between empirical and expected bit-error-rate(s). We present new theoretical results which bound this gap and show the dependence on the decoder complexity, in terms of code parameters (blocklength, message length, variable/check node degrees), decoding iterations, and the training dataset size. Results are presented for both regular and irregular parity-check matrices. To the best of our knowledge, this is the first set of theoretical results on generalization performance of neural network based decoders. We present experimental results to show the dependence of generalization gap on the training dataset size, and decoding iterations for different codes.
△ Less
Submitted 20 April, 2024; v1 submitted 17 May, 2023;
originally announced May 2023.
-
BMS symmetry in gravity: Front form versus Instant form
Authors:
Sudarshan Ananth,
Sucheta Majumdar
Abstract:
In General Relativity, the allowed set of diffeomorphisms or gauge transformations at asymptotic infinity forms the BMS group, an infinite-dimensional extension of the Poincaré group. We focus on the structure of the BMS group in two distinct forms of Hamiltonian dynamics - the instant and front forms. Both similarities and differences in these two forms are examined while emphasising the role of…
▽ More
In General Relativity, the allowed set of diffeomorphisms or gauge transformations at asymptotic infinity forms the BMS group, an infinite-dimensional extension of the Poincaré group. We focus on the structure of the BMS group in two distinct forms of Hamiltonian dynamics - the instant and front forms. Both similarities and differences in these two forms are examined while emphasising the role of non-covariant approaches to symmetries in gravity.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
Question-Answering System Extracts Information on Injection Drug Use from Clinical Notes
Authors:
Maria Mahbub,
Ian Goethert,
Ioana Danciu,
Kathryn Knight,
Sudarshan Srinivasan,
Suzanne Tamang,
Karine Rozenberg-Ben-Dror,
Hugo Solares,
Susana Martins,
Jodie Trafton,
Edmon Begoli,
Gregory Peterson
Abstract:
Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU infor…
▽ More
Background: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU information can be indicated is unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. Methods: To address this gap in clinical information, we design and demonstrate a question-answering (QA) framework to extract information on IDU from clinical notes. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) develo** and testing the QA model. We utilize 2323 clinical notes of 1145 patients sourced from the VA Corporate Data Warehouse to construct the gold-standard dataset for develo** and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information on temporally out-of-distribution data. Results: Here we show that for a strict match between gold-standard and predicted answers, the QA model achieves 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. Conclusions: Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care.
△ Less
Submitted 28 December, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Guaranteeing Envy-Freeness under Generalized Assignment Constraints
Authors:
Siddharth Barman,
Arindam Khan,
Sudarshan Shyam,
K. V. N. Sreenivas
Abstract:
We study fair division of goods under the broad class of generalized assignment constraints. In this constraint framework, the sizes and values of the goods are agent-specific, and one needs to allocate the goods among the agents fairly while further ensuring that each agent receives a bundle of total size at most the corresponding budget of the agent. Since, in such a constraint setting, it may n…
▽ More
We study fair division of goods under the broad class of generalized assignment constraints. In this constraint framework, the sizes and values of the goods are agent-specific, and one needs to allocate the goods among the agents fairly while further ensuring that each agent receives a bundle of total size at most the corresponding budget of the agent. Since, in such a constraint setting, it may not always be feasible to partition all the goods among the agents, we conform -- as in recent works -- to the construct of charity to designate the set of unassigned goods. For this allocation framework, we obtain existential and computational guarantees for envy-free (appropriately defined) allocation of divisible and indivisible goods, respectively, among agents with individual, additive valuations for the goods.
We deem allocations to be fair by evaluating envy only with respect to feasible subsets. In particular, an allocation is said to be feasibly envy-free (FEF) iff each agent prefers its bundle over every (budget) feasible subset within any other agent's bundle (and within the charity). The current work establishes that, for divisible goods, FEF allocations are guaranteed to exist and can be computed efficiently under generalized assignment constraints.
In the context of indivisible goods, FEF allocations do not necessarily exist, and hence, we consider the fairness notion of feasible envy-freeness up to any good (FEFx). We show that, under generalized assignment constraints, an FEFx allocation of indivisible goods always exists. In fact, our FEFx result resolves open problems posed in prior works. Further, for indivisible goods and under generalized assignment constraints, we provide a pseudo-polynomial time algorithm for computing FEFx allocations, and a fully polynomial-time approximation scheme (FPTAS) for computing approximate FEFx allocations.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
A novel higher-order numerical method for parabolic integro-fractional differential equations based on wavelets and $L2$-$1_σ$ scheme
Authors:
Sudarshan Santra,
Ratikanta Behera
Abstract:
This study aims to construct an efficient and highly accurate numerical method to solve a class of parabolic integro-fractional differential equations, which is based on wavelets and $L2$-$1_σ$ scheme. Specifically, the Haar wavelet decomposition is used for grid adaptation and efficient computations, while the high order $L2$-$1_σ$ scheme is considered to discretize the time-fractional operator.…
▽ More
This study aims to construct an efficient and highly accurate numerical method to solve a class of parabolic integro-fractional differential equations, which is based on wavelets and $L2$-$1_σ$ scheme. Specifically, the Haar wavelet decomposition is used for grid adaptation and efficient computations, while the high order $L2$-$1_σ$ scheme is considered to discretize the time-fractional operator. Second-order discretizations are used to approximate the spatial derivatives to solve the one-dimensional problem, while a repeated quadrature rule based on trapezoidal approximation is employed to discretize the integral operator. In contrast, we use the semi-discretization of the proposed two-dimensional model based on the $L2$-$1_σ$ scheme for the fractional operator and composite trapezoidal approximation for the integral part. The spatial derivatives are then approximated using two-dimensional Haar wavelets. In this study, we investigated theoretically and verified numerically the behavior of the proposed higher-order numerical methods. In particular, stability and convergence analyses are conducted. The obtained results are compared with those of some existing techniques through several graphs and tables, and it is shown that the proposed higher-order methods have better accuracy and produce less error compared to the $L1$ scheme in favor of fractional-order integro-partial differential equations.
△ Less
Submitted 28 December, 2023; v1 submitted 17 April, 2023;
originally announced April 2023.
-
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Machine Learning
Authors:
William Won,
Midhilesh Elavazhagan,
Sudarshan Srinivasan,
Ajaya Durg,
Samvit Kaul,
Swati Gupta,
Tushar Krishna
Abstract:
The surge of artificial intelligence, specifically large language models, has led to a rapid advent towards the development of large-scale machine learning training clusters. Collective communications within these clusters tend to be heavily bandwidth-bound, necessitating techniques to optimally utilize the available network bandwidth. This puts the routing algorithm for the collective at the fore…
▽ More
The surge of artificial intelligence, specifically large language models, has led to a rapid advent towards the development of large-scale machine learning training clusters. Collective communications within these clusters tend to be heavily bandwidth-bound, necessitating techniques to optimally utilize the available network bandwidth. This puts the routing algorithm for the collective at the forefront of determining the performance. Unfortunately, communication libraries used in distributed machine learning today are limited by a fixed set of routing algorithms. This constraints collective performance within the domain of next-generation training clusters that employ intricate, heterogeneous, and asymmetric, large-scale topologies. Further, the emergence of irregular topologies attributed to runtime phenomena such as device failures serves to compound the complexity of the challenge. To this end, this paper introduces TACOS, an automated synthesizer that generates topology-aware collective algorithms for common distributed machine learning collectives across arbitrary input network topologies. TACOS was able to synthesize All-Reduce algorithm for a heterogeneous 512-NPU system in just 6.09 minutes while achieving performance improvement up to 4.27x over state-of-the-art prior work. TACOS exhibits high scalability, with synthesis time scaling quadratically with the number of NPUs. In contrast to prior works' NP-hard approaches, TACOS with 40K NPUs completes in 2.52 hours.
△ Less
Submitted 29 March, 2024; v1 submitted 11 April, 2023;
originally announced April 2023.
-
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
Authors:
Sudarshan S Harithas,
Gurkirat Singh,
Aneesh Chavan,
Sarthak Sharma,
Suraj Patni,
Chetan Arora,
K. Madhava Krishna
Abstract:
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads…
▽ More
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Authors:
William Won,
Taekyung Heo,
Saeed Rashidi,
Srinivas Sridharan,
Sudarshan Srinivasan,
Tushar Krishna
Abstract:
As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emergin…
▽ More
As deep learning models and input data are scaling at an unprecedented rate, it is inevitable to move towards distributed training platforms to fit the model and increase training throughput. State-of-the-art approaches and techniques, such as wafer-scale nodes, multi-dimensional network topologies, disaggregated memory systems, and parallelization strategies, have been actively adopted by emerging distributed training systems. This results in a complex SW/HW co-design stack of distributed training, necessitating a modeling/simulation infrastructure for design-space exploration. In this paper, we extend the open-source ASTRA-sim infrastructure and endow it with the capabilities to model state-of-the-art and emerging distributed training models and platforms. More specifically, (i) we enable ASTRA-sim to support arbitrary model parallelization strategies via a graph-based training-loop implementation, (ii) we implement a parameterizable multi-dimensional heterogeneous topology generation infrastructure with analytical performance estimates enabling simulating target systems at scale, and (iii) we enhance the memory system modeling to support accurate modeling of in-network collective communication and disaggregated memory systems. With such capabilities, we run comprehensive case studies targeting emerging distributed models and platforms. This infrastructure lets system designers swiftly traverse the complex co-design stack and give meaningful insights when designing and deploying distributed training platforms at scale.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
A dynamic risk score for early prediction of cardiogenic shock using machine learning
Authors:
Yuxuan Hu,
Albert Lui,
Mark Goldstein,
Mukund Sudarshan,
Andrea Tinsay,
Cindy Tsui,
Samuel Maidman,
John Medamana,
Neil Jethani,
Aahlad Puli,
Vuthy Nguy,
Yindalon Aphinyanaphongs,
Nicholas Kiefer,
Nathaniel Smilowitz,
James Horowitz,
Tania Ahuja,
Glenn I Fishman,
Judith Hochman,
Stuart Katz,
Samuel Bernard,
Rajesh Ranganath
Abstract:
Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to…
▽ More
Myocardial infarction and heart failure are major cardiovascular diseases that affect millions of people in the US. The morbidity and mortality are highest among patients who develop cardiogenic shock. Early recognition of cardiogenic shock is critical. Prompt implementation of treatment measures can prevent the deleterious spiral of ischemia, low blood pressure, and reduced cardiac output due to cardiogenic shock. However, early identification of cardiogenic shock has been challenging due to human providers' inability to process the enormous amount of data in the cardiac intensive care unit (ICU) and lack of an effective risk stratification tool. We developed a deep learning-based risk stratification tool, called CShock, for patients admitted into the cardiac ICU with acute decompensated heart failure and/or myocardial infarction to predict onset of cardiogenic shock. To develop and validate CShock, we annotated cardiac ICU datasets with physician adjudicated outcomes. CShock achieved an area under the receiver operator characteristic curve (AUROC) of 0.820, which substantially outperformed CardShock (AUROC 0.519), a well-established risk score for cardiogenic shock prognosis. CShock was externally validated in an independent patient cohort and achieved an AUROC of 0.800, demonstrating its generalizability in other cardiac ICUs.
△ Less
Submitted 28 March, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
Remote Task-oriented Grasp Area Teaching By Non-Experts through Interactive Segmentation and Few-Shot Learning
Authors:
Furkan Kaynar,
Sudarshan Rajagopalan,
Shaobo Zhou,
Eckehard Steinbach
Abstract:
A robot operating in unstructured environments must be able to discriminate between different gras** styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented gras**. We propose a novel two-step framework towards this aim. The first step involves gr…
▽ More
A robot operating in unstructured environments must be able to discriminate between different gras** styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented gras**. We propose a novel two-step framework towards this aim. The first step involves grasp area estimation by segmentation. We receive grasp area demonstrations for a new task via interactive segmentation, and learn from these few demonstrations to estimate the required grasp area on an unseen scene for the given task. The second step is autonomous grasp estimation in the segmented region. To train the segmentation network for few-shot learning, we built a grasp area segmentation (GAS) dataset with 10089 images grouped into 1121 segmentation tasks. We benefit from an efficient meta learning algorithm for training for few-shot adaptation. Experimental evaluation showed that our method successfully detects the correct grasp area on the respective objects in unseen test scenes and effectively allows remote teaching of new grasp strategies by non-experts.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.