-
SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
Authors:
Weiyi Xie,
Nathalie Willems,
Shubham Patil,
Yang Li,
Mayank Kumar
Abstract:
We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured…
▽ More
We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured in image embeddings. This innovative reformulation greatly reduces the need for time-consuming online user interactions for labeling volumetric images, such as exhaustively marking points and bounding boxes to provide prompts slice by slice. With our method, users can manually segment a few 2D slices offline, and the embeddings of these annotated image regions serve as effective prompts for online segmentation tasks. Our method prioritizes the efficiency of the fine-tuning process by exclusively training the mask decoder through caching mechanisms while kee** the image encoder frozen. Importantly, this approach is not limited to volumetric medical images, but can generically be applied to any 2D/3D segmentation task. To thoroughly evaluate our method, we conducted extensive validation on four datasets, covering six anatomical segmentation tasks across two modalities. Furthermore, we conducted a comparative analysis of different prompting options within SAM and the fully-supervised nnU-Net. The results demonstrate the superior performance of our method compared to SAM employing only point prompts (approximately 50% improvement in IoU) and performs on-par with fully supervised methods whilst reducing the requirement of labeled data by at least an order of magnitude.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Adaptive Deep Neural Network-Based Control Barrier Functions
Authors:
Hannah M. Sweatland,
Omkar Sudhir Patil,
Warren E. Dixon
Abstract:
Safety constraints of nonlinear control systems are commonly enforced through the use of control barrier functions (CBFs). Uncertainties in the dynamic model can disrupt forward invariance guarantees or cause the state to be restricted to an overly conservative subset of the safe set. In this paper, adaptive deep neural networks (DNNs) are combined with CBFs to produce a family of controllers that…
▽ More
Safety constraints of nonlinear control systems are commonly enforced through the use of control barrier functions (CBFs). Uncertainties in the dynamic model can disrupt forward invariance guarantees or cause the state to be restricted to an overly conservative subset of the safe set. In this paper, adaptive deep neural networks (DNNs) are combined with CBFs to produce a family of controllers that ensure safety while learning the system's dynamics in real-time without the requirement for pre-training. By basing the least squares adaptation law on a state derivative estimator-based identification error, the DNN parameter estimation error is shown to be uniformly ultimately bounded. The convergent bound on the parameter estimation error is then used to formulate CBF-constraints in an optimization-based controller to guarantee safety despite model uncertainty. Furthermore, the developed method is applicable for use under intermittent loss of state-feedback. Comparative simulation results demonstrate the ability of the developed method to ensure safety in an adaptive cruise control problem and when feedback is lost, unlike baseline methods.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
CaFA: Global Weather Forecasting with Factorized Attention on Sphere
Authors:
Zijie Li,
Anthony Zhou,
Saurabh Patil,
Amir Barati Farimani
Abstract:
Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the…
▽ More
Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^\circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
LLoCO: Learning Long Contexts Offline
Authors:
Sijun Tan,
Xiuyu Li,
Shishir Patil,
Ziyang Wu,
Tianjun Zhang,
Kurt Keutzer,
Joseph E. Gonzalez,
Raluca Ada Popa
Abstract:
Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM…
▽ More
Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using $30\times$ fewer tokens during inference. LLoCO achieves up to $7.62\times$ speed-up and substantially reduces the cost of long document question answering, making it a promising solution for efficient long context processing. Our code is publicly available at https://github.com/jeffreysijuntan/lloco.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Lyapunov-Based Deep Residual Neural Network (ResNet) Adaptive Control
Authors:
Omkar Sudhir Patil,
Duc M. Le,
Emily J. Griffis,
Warren E. Dixon
Abstract:
Deep Neural Network (DNN)-based controllers have emerged as a tool to compensate for unstructured uncertainties in nonlinear dynamical systems. A recent breakthrough in the adaptive control literature provides a Lyapunov-based approach to derive weight adaptation laws for each layer of a fully-connected feedforward DNN-based adaptive controller. However, deriving weight adaptation laws from a Lyap…
▽ More
Deep Neural Network (DNN)-based controllers have emerged as a tool to compensate for unstructured uncertainties in nonlinear dynamical systems. A recent breakthrough in the adaptive control literature provides a Lyapunov-based approach to derive weight adaptation laws for each layer of a fully-connected feedforward DNN-based adaptive controller. However, deriving weight adaptation laws from a Lyapunov-based analysis remains an open problem for deep residual neural networks (ResNets). This paper provides the first result on Lyapunov-derived weight adaptation for a ResNet-based adaptive controller. A nonsmooth Lyapunov-based analysis is provided to guarantee asymptotic tracking error convergence. Comparative Monte Carlo simulations are provided to demonstrate the performance of the developed ResNet-based adaptive controller. The ResNet-based adaptive controller shows a 64% improvement in the tracking and function approximation performance, in comparison to a fully-connected DNN-based adaptive controller.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
Authors:
Shishir G. Patil,
Tianjun Zhang,
Vivian Fang,
Noppapon C.,
Roy Huang,
Aaron Hao,
Martin Casado,
Joseph E. Gonzalez,
Raluca Ada Popa,
Ion Stoica
Abstract:
Large Language Models (LLMs) are evolving beyond their classical role of providing information within dialogue systems to actively engaging with tools and performing actions on real-world applications and services. Today, humans verify the correctness and appropriateness of the LLM-generated outputs (e.g., code, functions, or actions) before putting them into real-world execution. This poses signi…
▽ More
Large Language Models (LLMs) are evolving beyond their classical role of providing information within dialogue systems to actively engaging with tools and performing actions on real-world applications and services. Today, humans verify the correctness and appropriateness of the LLM-generated outputs (e.g., code, functions, or actions) before putting them into real-world execution. This poses significant challenges as code comprehension is well known to be notoriously difficult. In this paper, we study how humans can efficiently collaborate with, delegate to, and supervise autonomous LLMs in the future. We argue that in many cases, "post-facto validation" - verifying the correctness of a proposed action after seeing the output - is much easier than the aforementioned "pre-facto validation" setting. The core concept behind enabling a post-facto validation system is the integration of an intuitive undo feature, and establishing a damage confinement for the LLM-generated actions as effective strategies to mitigate the associated risks. Using this, a human can now either revert the effect of an LLM-generated output or be confident that the potential risk is bounded. We believe this is critical to unlock the potential for LLM agents to interact with applications and services with limited (post-facto) human involvement. We describe the design and implementation of our open-source runtime for executing LLM actions, Gorilla Execution Engine (GoEX), and present open research questions towards realizing the goal of LLMs and applications interacting with each other with minimal human supervision. We release GoEX at https://github.com/ShishirPatil/gorilla/.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A new approach to construct minimal linear codes over $\mathbb{F}_{3}$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy,
Bhagyashri S. Patil,
Sahar M. A. Maqbol
Abstract:
In this article, we present two new approaches to construct minimal linear codes of dimension $n+1$ over $\mathbb{F}_{3}$ using characteristic and ternary functions. We also obtain the weight distributions of these constructed minimal linear codes. We further show that a specific class of these codes violates Ashikhmin-Barg condition.
In this article, we present two new approaches to construct minimal linear codes of dimension $n+1$ over $\mathbb{F}_{3}$ using characteristic and ternary functions. We also obtain the weight distributions of these constructed minimal linear codes. We further show that a specific class of these codes violates Ashikhmin-Barg condition.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Hadamard Regularization of the Graviton Stress Tensor
Authors:
Anna Negro,
Subodh P. Patil
Abstract:
We present the details for the covariant renormalization of the stress tensor for vacuum tensor perturbations at the level of the effective action, adopting Hadamard regularization techniques to isolate short distance divergences and gauge fixing via the Faddeev-Popov procedure. The subsequently derived renormalized stress tensor can be related to more familiar forms reliant upon an averaging pres…
▽ More
We present the details for the covariant renormalization of the stress tensor for vacuum tensor perturbations at the level of the effective action, adopting Hadamard regularization techniques to isolate short distance divergences and gauge fixing via the Faddeev-Popov procedure. The subsequently derived renormalized stress tensor can be related to more familiar forms reliant upon an averaging prescription, such as the Isaacson or Misner-Thorne-Wheeler forms. The latter, however, are premised on a prior scale separation (beyond which the averaging is invoked) and therefore unsuited for the purposes of renormalization. This can lead to potentially unphysical conclusions when taken as a starting point for the computation of any observable that needs regularization, such as the energy density associated to a stochastic background. Any averaging prescription, if needed, should only be invoked at the end of the renormalization procedure. The latter necessarily involves the imposition of renormalization conditions via a physical measurement at some fixed scale, which we retrace for primordial gravitational waves sourced from vacuum fluctuations through direct or indirect observation.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Construction of Minimal Binary Linear Codes of dimension $n+3$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy,
Bhagyashri S. Patil
Abstract:
In this paper, we will give the generic construction of a binary linear code of dimension $n+3$ and derive the necessary and sufficient conditions for the constructed code to be minimal. Using generic construction, a new family of minimal binary linear code will be constructed from a special class of Boolean functions violating the Ashikhmin-Barg condition. We also obtain the weight distribution o…
▽ More
In this paper, we will give the generic construction of a binary linear code of dimension $n+3$ and derive the necessary and sufficient conditions for the constructed code to be minimal. Using generic construction, a new family of minimal binary linear code will be constructed from a special class of Boolean functions violating the Ashikhmin-Barg condition. We also obtain the weight distribution of the constructed minimal binary linear code.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Smart structural health monitoring (SHM) system for on-board localization of defects in pipes using torsional ultrasonic guided waves
Authors:
Sheetal Patil,
Sauvik Banerjee,
Siddharth Tallur
Abstract:
Most reported research for monitoring health of pipelines using ultrasonic guided waves (GW) typically utilize bulky piezoelectric transducer rings and laboratory-grade ultrasonic non-destructive testing (NDT) equipment. Consequently, the translation of these approaches from laboratory settings to field-deployable systems for real-time structural health monitoring (SHM) becomes challenging. In thi…
▽ More
Most reported research for monitoring health of pipelines using ultrasonic guided waves (GW) typically utilize bulky piezoelectric transducer rings and laboratory-grade ultrasonic non-destructive testing (NDT) equipment. Consequently, the translation of these approaches from laboratory settings to field-deployable systems for real-time structural health monitoring (SHM) becomes challenging. In this work, we present an innovative algorithm for damage identification and localization in pipes, implemented on a compact FPGA-based smart GW-SHM system. The custom-designed board, featuring a Xilinx Artix-7 FPGA and front-end electronics, is capable of actuating the PZT thickness shear mode transducers, data acquisition and recording from PZT sensors and generating a damage index (DI) map for localizing the damage on the structure. The algorithm is a variation of the common source method adapted for cylindrical geometry. The utility of the algorithm is demonstrated for detection and localization of defects such as notch and mass loading on a steel pipe, through extensive finite element (FE) method simulations. Experimental results obtained using a C-clamp for applying mass loading on the pipe show good agreement with the FE simulations. The localization error values for experimental data analyzed using C code on a processor implemented on the FPGA are consistent with algorithm results generated on a computer running MATLAB code. The system presented in this study is suitable for a wide range of GW-SHM applications, especially in cost-sensitive scenarios that benefit from on-node signal processing over cloud-based solutions.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
RAFT: Adapting Language Model to Domain Specific RAG
Authors:
Tianjun Zhang,
Shishir G. Patil,
Naman Jain,
Sheng Shen,
Matei Zaharia,
Ion Stoica,
Joseph E. Gonzalez
Abstract:
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain su…
▽ More
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm. When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning. However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings. In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents. RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question. This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason. In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG. RAFT's code and demo are open-sourced at github.com/ShishirPatil/gorilla.
△ Less
Submitted 5 June, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
First Constraints on the Epoch of Reionization Using the non-Gaussianity of the Kinematic Sunyaev-Zel{'}dovich Effect from the South Pole Telescope and {\it Herschel}-SPIRE Observations
Authors:
S. Raghunathan,
P. A. R. Ade,
A. J. Anderson,
B. Ansarinejad,
M. Archipley,
J. E. Austermann,
L. Balkenhol,
J. A. Beall,
K. Benabed,
A. N. Bender,
B. A. Benson,
F. Bianchini,
L. E. Bleem,
J. Bock,
F. R. Bouchet,
L. Bryant,
E. Camphuis,
J. E. Carlstrom,
T. W. Cecil,
C. L. Chang,
P. Chaubal,
H. C. Chiang,
P. M. Chichura,
T. -L. Chou,
R. Citron
, et al. (97 additional authors not shown)
Abstract:
We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ i…
▽ More
We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ in bands centered at 95, 150, and 220 GHz. For SPIRE, we include data from the 600 and 857 GHz bands. We reconstruct the velocity-induced large-scale correlation of the small-scale kSZ signal with a quadratic estimator that uses two cosmic microwave background (CMB) temperature maps, constructed by optimally combining data from all the frequency bands. We reject the null hypothesis of a zero trispectrum at $10.3σ$ level. However, the measured trispectrum contains contributions from both the kSZ and other undesired components, such as CMB lensing and astrophysical foregrounds, with kSZ being sub-dominant. We use the \textsc{Agora} simulations to estimate the expected signal from CMB lensing and astrophysical foregrounds. After accounting for the contributions from CMB lensing and foreground signals, we do not detect an excess kSZ-only trispectrum and use this non-detection to set constraints on reionization. By applying a prior based on observations of the Gunn-Peterson trough, we obtain an upper limit on the duration of reionization of $Δz_{\rm re, 50} < 4.5$ (95\% C.L). We find these constraints are fairly robust to foregrounds assumptions. This trispectrum measurement is independent of, but consistent with, {\it Planck}'s optical depth measurement. This result is the first constraint on the epoch of reionization using the non-Gaussian nature of the kSZ signal.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Latent Neural PDE Solver: a reduced-order modelling framework for partial differential equations
Authors:
Zijie Li,
Saurabh Patil,
Francis Ogoke,
Dule Shu,
Wilson Zhen,
Michael Schneier,
John R. Buchanan, Jr.,
Amir Barati Farimani
Abstract:
Neural networks have shown promising potential in accelerating the numerical simulation of systems governed by partial differential equations (PDEs). Different from many existing neural network surrogates operating on high-dimensional discretized fields, we propose to learn the dynamics of the system in the latent space with much coarser discretizations. In our proposed framework - Latent Neural P…
▽ More
Neural networks have shown promising potential in accelerating the numerical simulation of systems governed by partial differential equations (PDEs). Different from many existing neural network surrogates operating on high-dimensional discretized fields, we propose to learn the dynamics of the system in the latent space with much coarser discretizations. In our proposed framework - Latent Neural PDE Solver (LNS), a non-linear autoencoder is first trained to project the full-order representation of the system onto the mesh-reduced space, then a temporal model is trained to predict the future state in this mesh-reduced space. This reduction process simplifies the training of the temporal model by greatly reducing the computational cost accompanying a fine discretization. We study the capability of the proposed framework and several other popular neural PDE solvers on various types of systems including single-phase and multi-phase flows along with varying system parameters. We showcase that it has competitive accuracy and efficiency compared to the neural PDE solver that operates on full-order space.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Probing the inverse moment of $B_s$-meson distribution amplitude via $B_s \to η_s$ form factors
Authors:
Rusa Mandal,
Praveen S Patil,
Ipsita Ray
Abstract:
We investigate the inverse moment of the $B_s$-meson light-cone distribution amplitude (LCDA), denoted as $λ_{B_s}$ and defined within the heavy quark effective theory, through the calculation of $B_s \to η_s$ form factors. The presence of the $s$-quark inside the $B_s$-meson dictates a notable departure of approximately $20\%$ in the $λ_{B_s}$ value compared to the non-strange case $λ_{B_q}$, as…
▽ More
We investigate the inverse moment of the $B_s$-meson light-cone distribution amplitude (LCDA), denoted as $λ_{B_s}$ and defined within the heavy quark effective theory, through the calculation of $B_s \to η_s$ form factors. The presence of the $s$-quark inside the $B_s$-meson dictates a notable departure of approximately $20\%$ in the $λ_{B_s}$ value compared to the non-strange case $λ_{B_q}$, as computed within the QCD sum rule approach, albeit with significant uncertainty. First, we compute the decay constant of the $η_s$-meson utilizing two-point sum rules while retaining finite $s$-quark mass contributions. Next, we constrain the parameter $λ_{B_s}$ by calculating $B_s \to η_s$ form factors within the light-cone sum rule approach, using $B_s$-meson LCDAs, and leveraging Lattice QCD estimates at zero momentum transfer from the HPQCD collaboration. Our findings yield $λ_{B_s}$ = 480 $\pm$ 92 MeV when expressing the $B_s$-meson LCDAs in the Exponential model, consistent with previous QCD sum rule estimate yet exhibiting a 1.5-fold improvement in uncertainty. Furthermore, we compare the form factor predictions, based on the extracted $λ_{B_s}$ value, with earlier analyses for other channels such as $B_s \to D_s$ and $B_s \to K$.
△ Less
Submitted 3 July, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
An Étude on the Regularization and Renormalization of Divergences in Primordial Observables
Authors:
Anna Negro,
Subodh P. Patil
Abstract:
Many cosmological observables of interest derive from primordial vacuum fluctuations evolved to late times. These observables represent statistical draws from some underlying quantum or statistical field theoretic framework where infinities arise and require regularization. After subtracting divergences, renormalization conditions must be imposed by measurements or observations at some scale, mind…
▽ More
Many cosmological observables of interest derive from primordial vacuum fluctuations evolved to late times. These observables represent statistical draws from some underlying quantum or statistical field theoretic framework where infinities arise and require regularization. After subtracting divergences, renormalization conditions must be imposed by measurements or observations at some scale, mindful of scheme and background dependence. We review this process on backgrounds that transition from finite duration inflation to radiation domination, and show how in spite of the ubiquity of scaleless integrals, UV divergences can still be meaningfully extracted from quantities that nominally vanish when dimensionally regularized. In this way, one can contextualize calculations with hard cutoffs, distinguishing between UV and IR scales corresponding to the beginning and end of inflation from UV and IR scales corresponding the unknown completion of the theory and its observables. This distinction has significance as observable quantities cannot depend on the latter although they will certainly depend on the former. One can also explicitly show the scheme independence of the coefficients of UV divergent logarithms. Furthermore, certain IR divergences can be shown to be an artifact of the de Sitter limit and are cured for finite duration inflation. For gravitational wave observables, we stress the need to regularize stress tensors that do not presume a prior scale separation in their construction (as with the standard Isaacson form), deriving an improved stress tensor fit to purpose. We conclude by highlighting the inextricable connection between inferring $N_{\rm eff}$ bounds from vacuum tensor perturbations and the process of background renormalization.
△ Less
Submitted 3 June, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
A pulsar-like swing in the polarisation position angle of a nearby fast radio burst
Authors:
Ryan Mckinven,
Mohit Bhardwaj,
Tarraneh Eftekhari,
Charles D. Kilpatrick,
Aida Kirichenko,
Arpan Pal,
Amanda M. Cook,
B. M. Gaensler,
Utkarsh Giri,
Victoria M. Kaspi,
Daniele Michilli,
Kenzie Nimmo,
Aaron B. Pearlman,
Ziggy Pleunis,
Ketan R. Sand,
Ingrid Stairs,
Bridget C. Andersen,
Shion Andrew,
Kevin Bandura,
Charanjot Brar,
Tomas Cassanelli,
Shami Chatterjee,
Alice P. Curtin,
Fengqiu Adam Dong,
Gwendolyn Eadie
, et al. (19 additional authors not shown)
Abstract:
Fast radio bursts (FRBs) last for milliseconds and arrive at Earth from cosmological distances. While their origin(s) and emission mechanism(s) are presently unknown, their signals bear similarities with the much less luminous radio emission generated by pulsars within our Galaxy and several lines of evidence point toward neutron star origins. For pulsars, the linear polarisation position angle (P…
▽ More
Fast radio bursts (FRBs) last for milliseconds and arrive at Earth from cosmological distances. While their origin(s) and emission mechanism(s) are presently unknown, their signals bear similarities with the much less luminous radio emission generated by pulsars within our Galaxy and several lines of evidence point toward neutron star origins. For pulsars, the linear polarisation position angle (PA) often exhibits evolution over the pulse phase that is interpreted within a geometric framework known as the rotating vector model (RVM). Here, we report on a fast radio burst, FRB 20221022A, detected by the Canadian Hydrogen Intensity Map** Experiment (CHIME) and localized to a nearby host galaxy ($\sim 65\; \rm{Mpc}$), MCG+14-02-011. This one-off FRB displays a $\sim 130$ degree rotation of its PA over its $\sim 2.5\; \rm{ms}$ burst duration, closely resembling the "S"-shaped PA evolution commonly seen from pulsars and some radio magnetars. The PA evolution disfavours emission models involving shocks far from the source and instead suggests magnetospheric origins for this source which places the emission region close to the FRB central engine, echoing similar conclusions drawn from tempo-polarimetric studies of some repeating sources. This FRB's PA evolution is remarkably well-described by the RVM and, although we cannot determine the inclination and magnetic obliquity due to the unknown period/duty cycle of the source, we can dismiss extremely short-period pulsars (e.g., recycled millisecond pulsars) as potential progenitors. RVM-fitting appears to favour a source occupying a unique position in the period/duty cycle phase space that implies tight opening angles for the beamed emission, significantly reducing burst energy requirements of the source.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
SPT Clusters with DES and HST Weak Lensing. II. Cosmological Constraints from the Abundance of Massive Halos
Authors:
S. Bocquet,
S. Grandis,
L. E. Bleem,
M. Klein,
J. J. Mohr,
T. Schrabback,
T. M. C. Abbott,
P. A. R. Ade,
M. Aguena,
A. Alarcon,
S. Allam,
S. W. Allen,
O. Alves,
A. Amon,
A. J. Anderson,
J. Annis,
B. Ansarinejad,
J. E. Austermann,
S. Avila,
D. Bacon,
M. Bayliss,
J. A. Beall,
K. Bechtol,
M. R. Becker,
A. N. Bender
, et al. (171 additional authors not shown)
Abstract:
We present cosmological constraints from the abundance of galaxy clusters selected via the thermal Sunyaev-Zel'dovich (SZ) effect in South Pole Telescope (SPT) data with a simultaneous mass calibration using weak gravitational lensing data from the Dark Energy Survey (DES) and the Hubble Space Telescope (HST). The cluster sample is constructed from the combined SPT-SZ, SPTpol ECS, and SPTpol 500d…
▽ More
We present cosmological constraints from the abundance of galaxy clusters selected via the thermal Sunyaev-Zel'dovich (SZ) effect in South Pole Telescope (SPT) data with a simultaneous mass calibration using weak gravitational lensing data from the Dark Energy Survey (DES) and the Hubble Space Telescope (HST). The cluster sample is constructed from the combined SPT-SZ, SPTpol ECS, and SPTpol 500d surveys, and comprises 1,005 confirmed clusters in the redshift range $0.25-1.78$ over a total sky area of 5,200 deg$^2$. We use DES Year 3 weak-lensing data for 688 clusters with redshifts $z<0.95$ and HST weak-lensing data for 39 clusters with $0.6<z<1.7$. The weak-lensing measurements enable robust mass measurements of sample clusters and allow us to empirically constrain the SZ observable--mass relation. For a flat $Λ$CDM cosmology, and marginalizing over the sum of massive neutrinos, we measure $Ω_\mathrm{m}=0.286\pm0.032$, $σ_8=0.817\pm0.026$, and the parameter combination $σ_8\,(Ω_\mathrm{m}/0.3)^{0.25}=0.805\pm0.016$. Our measurement of $S_8\equivσ_8\,\sqrt{Ω_\mathrm{m}/0.3}=0.795\pm0.029$ and the constraint from Planck CMB anisotropies (2018 TT,TE,EE+lowE) differ by $1.1σ$. In combination with that Planck dataset, we place a 95% upper limit on the sum of neutrino masses $\sum m_ν<0.18$ eV. When additionally allowing the dark energy equation of state parameter $w$ to vary, we obtain $w=-1.45\pm0.31$ from our cluster-based analysis. In combination with Planck data, we measure $w=-1.34^{+0.22}_{-0.15}$, or a $2.2σ$ difference with a cosmological constant. We use the cluster abundance to measure $σ_8$ in five redshift bins between 0.25 and 1.8, and we find the results to be consistent with structure growth as predicted by the $Λ$CDM model fit to Planck primary CMB data.
△ Less
Submitted 21 June, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
aMUSEd: An Open MUSE Reproduction
Authors:
Suraj Patil,
William Berman,
Robin Rombach,
Patrick von Platen
Abstract:
We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpre…
▽ More
We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
32-Bit RISC-V CPU Core on Logisim
Authors:
Siddesh D. Patil,
Premraj V. Jadhav,
Siddharth Sankhe
Abstract:
This project focuses on making a RISC-V CPU Core using the Logisim software. RISC-V is significant because it will allow smaller device manufacturers to build hardware without paying royalties and allow developers and researchers to design and experiment with a proven and freely available instruction set architecture. RISC-V is ideal for a variety of applications from IOTs to Embedded systems such…
▽ More
This project focuses on making a RISC-V CPU Core using the Logisim software. RISC-V is significant because it will allow smaller device manufacturers to build hardware without paying royalties and allow developers and researchers to design and experiment with a proven and freely available instruction set architecture. RISC-V is ideal for a variety of applications from IOTs to Embedded systems such as disks, CPUs, Calculators, SOCs, etc. RISC-V(Reduced Instruction Set Architecture) is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. Unlike most other ISA designs, the RISC-V ISA is provided under open source licenses that do not require fees to use.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Design Space and Variability Analysis of SOI MOSFET for Ultra-Low Power Band-to-Band Tunneling Neurons
Authors:
Jay Sonawane,
Shubham Patil,
Abhishek Kadam,
Ajay Kumar Singh,
Sandip Lashkare,
Veeresh Deshpande,
Udayan Ganguly
Abstract:
Large spiking neural networks (SNNs) require ultra-low power and low variability hardware for neuromorphic computing applications. Recently, a band-to-band tunneling-based (BTBT) integrator, enabling sub-kHz operation of neurons with area and energy efficiency, was proposed. For an ultra-low power implementation of such neurons, a very low BTBT current is needed, so minimizing current without degr…
▽ More
Large spiking neural networks (SNNs) require ultra-low power and low variability hardware for neuromorphic computing applications. Recently, a band-to-band tunneling-based (BTBT) integrator, enabling sub-kHz operation of neurons with area and energy efficiency, was proposed. For an ultra-low power implementation of such neurons, a very low BTBT current is needed, so minimizing current without degrading neuronal properties is essential. Low variability is needed in the ultra-low current integrator to avoid network performance degradation in a large BTBT neuron-based SNN. To address this, we conducted design space and variability analysis in TCAD, utilizing a well-calibrated TCAD deck with experimental data from GlobalFoundries 32nm PD-SOI MOSFET. First, we discuss the physics-based explanation of the tunneling mechanism. Second, we explore the impact of device design parameters on SOI MOSFET performance, highlighting parameter sensitivities to tunneling current. With device parameters' optimization, we demonstrate a ~20x reduction in BTBT current compared to the experimental data. Finally, a variability analysis that includes the effects of random dopant fluctuations (RDF), oxide thickness variability (OTV), and channel-oxide interface traps DIT in the BTBT, SS, and ON regimes of operation is shown. The BTBT regime shows high sensitivity to the RDF and OTV as any variation in them directly modulates the tunnel length or the electric field at the drain-channel junction, whereas minimal sensitivity to DIT is observed.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
A novel RNA pseudouridine site prediction model using Utility Kernel and data-driven parameters
Authors:
Sourabh Patil,
Archana Mathur,
Raviprasad Aduri,
Snehanshu Saha
Abstract:
RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have enumerated the RPIs at the residue level and have elucidated the minimum structural unit (MSU) in these interactions to be a stretch of five residues (Nucleotides/amino acids). Pseudouridine is the most frequent modification in RNA. The conversion of uridine to pseudouridine involves interactions betwee…
▽ More
RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have enumerated the RPIs at the residue level and have elucidated the minimum structural unit (MSU) in these interactions to be a stretch of five residues (Nucleotides/amino acids). Pseudouridine is the most frequent modification in RNA. The conversion of uridine to pseudouridine involves interactions between pseudouridine synthase and RNA. The existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user-defined features such as mono and dinucleotide composition/propensities of RNA sequences. Predicting pseudouridine sites is a non-linear classification problem with limited data points. Deep Learning models are efficient discriminators when the data set size is reasonably large and fail when there is a paucity of data ($<1000$ samples). To mitigate this problem, we propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics, and using data-driven parameters (i.e. MSU) as features. For this purpose, we have used position-specific tri/quad/pentanucleotide composition/propensity (PSPC/PSPP) besides nucleotide and dineculeotide composition as features. SVMs are known to work well in small data regimes and kernels in SVM are designed to classify non-linear data. The proposed model outperforms the existing state-of-the-art models significantly (10%-15% on average).
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Composite Adaptive Lyapunov-Based Deep Neural Network (Lb-DNN) Controller
Authors:
Omkar Sudhir Patil,
Emily J. Griffis,
Wanjiku A. Makumi,
Warren E. Dixon
Abstract:
Recent advancements in adaptive control have equipped deep neural network (DNN)-based controllers with Lyapunov-based adaptation laws that work across a range of DNN architectures to uniquely enable online learning. However, the adaptation laws are based on tracking error, and offer convergence guarantees on only the tracking error without providing conclusions on the parameter estimation performa…
▽ More
Recent advancements in adaptive control have equipped deep neural network (DNN)-based controllers with Lyapunov-based adaptation laws that work across a range of DNN architectures to uniquely enable online learning. However, the adaptation laws are based on tracking error, and offer convergence guarantees on only the tracking error without providing conclusions on the parameter estimation performance. Motivated to provide guarantees on the DNN parameter estimation performance, this paper provides the first result on composite adaptation for adaptive Lyapunov-based DNN controllers, which uses the Jacobian of the DNN and a prediction error of the dynamics that is computed using a novel method involving an observer of the dynamics. A Lyapunov-based stability analysis is performed which guarantees the tracking, observer, and parameter estimation errors are uniformly ultimately bounded (UUB), with stronger performance guarantees when the DNN's Jacobian satisfies the persistence of excitation (PE) condition. Comparative simulation results demonstrate a significant performance improvement with the developed composite adaptive Lb-DNN controller in comparison to the tracking error-based Lb-DNN.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Galaxy Clusters Discovered via the Thermal Sunyaev-Zel'dovich Effect in the 500-square-degree SPTpol Survey
Authors:
L. E. Bleem,
M. Klein,
T. M. C. Abbott,
P. A. R. Ade,
M. Aguena,
O. Alves,
A. J. Anderson,
F. Andrade-Oliveira,
B. Ansarinejad,
M. Archipley,
M. L. N. Ashby,
J. E. Austermann,
D. Bacon,
J. A. Beall,
A. N. Bender,
B. A. Benson,
F. Bianchini,
S. Bocquet,
D. Brooks,
D. L. Burke,
M. Calzadilla,
J. E. Carlstrom,
A. Carnero Rosell,
J. Carretero,
C. L. Chang
, et al. (103 additional authors not shown)
Abstract:
We present a catalog of 689 galaxy cluster candidates detected at significance $ξ>4$ via their thermal Sunyaev-Zel'dovich (SZ) effect signature in 95 and 150 GHz data from the 500-square-degree SPTpol survey. We use optical and infrared data from the Dark Energy Camera and the Wide-field Infrared Survey Explorer (WISE) and \spitzer \ satellites, to confirm 544 of these candidates as clusters with…
▽ More
We present a catalog of 689 galaxy cluster candidates detected at significance $ξ>4$ via their thermal Sunyaev-Zel'dovich (SZ) effect signature in 95 and 150 GHz data from the 500-square-degree SPTpol survey. We use optical and infrared data from the Dark Energy Camera and the Wide-field Infrared Survey Explorer (WISE) and \spitzer \ satellites, to confirm 544 of these candidates as clusters with $\sim94\%$ purity. The sample has an approximately redshift-independent mass threshold at redshift $z>0.25$ and spans $1.5 \times 10^{14} < M_{500c} < 9.1 \times 10^{14}$ $M_\odot/h_{70}$ \ and $0.03<z\lesssim1.6$ in mass and redshift, respectively; 21\% of the confirmed clusters are at $z>1$. We use external radio data from the Sydney University Molonglo Sky Survey (SUMSS) to estimate contamination to the SZ signal from synchrotron sources. The contamination reduces the recovered $ξ$ by a median value of 0.032, or $\sim0.8\%$ of the $ξ=4$ threshold value, and $\sim7\%$ of candidates have a predicted contamination greater than $Δξ= 1$. With the exception of a small number of systems $(<1\%)$, an analysis of clusters detected in single-frequency 95 and 150 GHz data shows no significant contamination of the SZ signal by emission from dusty or synchrotron sources. This cluster sample will be a key component in upcoming astrophysical and cosmological analyses of clusters. The SPTpol millimeter-wave maps and associated data products used to produce this sample are available at https://pole.uchicago.edu/public/data/sptpol_500d_clusters/index.html, and the NASA LAMBDA website. An interactive sky server with the SPTpol maps and Dark Energy Survey data release 2 images is also available at NCSA https://skyviewer.ncsa.illinois.edu.
△ Less
Submitted 8 February, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
LCM-LoRA: A Universal Stable-Diffusion Acceleration Module
Authors:
Simian Luo,
Yiqin Tan,
Suraj Patil,
Daniel Gu,
Patrick von Platen,
Apolinário Passos,
Longbo Huang,
Jian Li,
Hang Zhao
Abstract:
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Dif…
▽ More
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Lyapunov-Based Dropout Deep Neural Network (Lb-DDNN) Controller
Authors:
Saiedeh Akbari,
Emily J. Griffis,
Omkar Sudhir Patil,
Warren E. Dixon
Abstract:
Deep neural network (DNN)-based adaptive controllers can be used to compensate for unstructured uncertainties in nonlinear dynamic systems. However, DNNs are also very susceptible to overfitting and co-adaptation. Dropout regularization is an approach where nodes are randomly dropped during training to alleviate issues such as overfitting and co-adaptation. In this paper, a dropout DNN-based adapt…
▽ More
Deep neural network (DNN)-based adaptive controllers can be used to compensate for unstructured uncertainties in nonlinear dynamic systems. However, DNNs are also very susceptible to overfitting and co-adaptation. Dropout regularization is an approach where nodes are randomly dropped during training to alleviate issues such as overfitting and co-adaptation. In this paper, a dropout DNN-based adaptive controller is developed. The developed dropout technique allows the deactivation of weights that are stochastically selected for each individual layer within the DNN. Simultaneously, a Lyapunov-based real-time weight adaptation law is introduced to update the weights of all layers of the DNN for online unsupervised learning. A non-smooth Lyapunov-based stability analysis is performed to ensure asymptotic convergence of the tracking error. Simulation results of the developed dropout DNN-based adaptive controller indicate a 38.32% improvement in the tracking error, a 53.67% improvement in the function approximation error, and 50.44% lower control effort when compared to a baseline adaptive DNN-based controller without dropout regularization.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Efficient learning of arbitrary single-copy quantum states
Authors:
Shibdas Roy,
Filippo Caruso,
Srushti Patil
Abstract:
Quantum state tomography is the problem of estimating a given quantum state. Usually, it is required to run the quantum experiment - state preparation, state evolution, measurement - several times to be able to estimate the output quantum state of the experiment, because an exponentially high number of copies of the state is required. In this work, we present an efficient algorithm to estimate wit…
▽ More
Quantum state tomography is the problem of estimating a given quantum state. Usually, it is required to run the quantum experiment - state preparation, state evolution, measurement - several times to be able to estimate the output quantum state of the experiment, because an exponentially high number of copies of the state is required. In this work, we present an efficient algorithm to estimate with a small but non-zero probability of error the output state of the experiment using a single copy of the state, without knowing the evolution dynamics of the state. It also does not destroy the original state, which can be recovered easily for any further quantum processing. As an example, it is usually required to repeat a quantum image processing experiment many times, since many copies of the state of the output image are needed to extract the information from all its pixels. The information from $\mathcal{N}$ pixels of the image can be inferred from a single run of the image processing experiment in our algorithm, to efficiently estimate the density matrix of the image state.
△ Less
Submitted 12 November, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Robotic Barrier Construction through Weaved, Inflatable Tubes
Authors:
H. J. Kim,
H. Abdel-Raziq,
X. Liu,
A. Y. Siskovic,
S. Patil,
K. H. Petersen,
H. L. Kao
Abstract:
In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as bar…
▽ More
In this article, we present a mechanism and related path planning algorithm to construct light-duty barriers out of extruded, inflated tubes weaved around existing environmental features. Our extruded tubes are based on everted vine-robots and in this context, we present a new method to steer their growth. We characterize the mechanism in terms of accuracy resilience, and, towards their use as barriers, the ability of the tubes to withstand distributed loads. We further explore an algorithm which, given a feature map and the size and direction of the external load, can determine where and how to extrude the barrier. Finally, we showcase the potential of this method in an autonomously extruded two-layer wall weaved around three pipes. While preliminary, our work indicates that this method has the potential for barrier construction in cluttered environments, e.g. shelters against wind or snow. Future work may show how to achieve tighter weaves, how to leverage weave friction for improved strength, how to assess barrier performance for feedback control, and how to operate the extrusion mechanism off of a mobile robot.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Hybrid quantum-classical graph neural networks for tumor classification in digital pathology
Authors:
Anupama Ray,
Dhiraj Madan,
Srushti Patil,
Maria Anna Rapsomaniki,
Pushpak Pati
Abstract:
Advances in classical machine learning and single-cell technologies have paved the way to understand interactions between disease cells and tumor microenvironments to accelerate therapeutic discovery. However, challenges in these machine learning methods and NP-hard problems in spatial Biology create an opportunity for quantum computing algorithms. We create a hybrid quantum-classical graph neural…
▽ More
Advances in classical machine learning and single-cell technologies have paved the way to understand interactions between disease cells and tumor microenvironments to accelerate therapeutic discovery. However, challenges in these machine learning methods and NP-hard problems in spatial Biology create an opportunity for quantum computing algorithms. We create a hybrid quantum-classical graph neural network (GNN) that combines GNN with a Variational Quantum Classifier (VQC) for classifying binary sub-tasks in breast cancer subty**. We explore two variants of the same, the first with fixed pretrained GNN parameters and the second with end-to-end training of GNN+VQC. The results demonstrate that the hybrid quantum neural network (QNN) is at par with the state-of-the-art classical graph neural networks (GNN) in terms of weighted precision, recall and F1-score. We also show that by means of amplitude encoding, we can compress information in logarithmic number of qubits and attain better performance than using classical compression (which leads to information loss while kee** the number of qubits required constant in both regimes). Finally, we show that end-to-end training enables to improve over fixed GNN parameters and also slightly improves over vanilla GNN with same number of dimensions.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Contrastive Self-Supervised Learning for Spatio-Temporal Analysis of Lung Ultrasound Videos
Authors:
Li Chen,
Jonathan Rubin,
Jiahong Ouyang,
Naveen Balaraju,
Shubham Patil,
Courosh Mehanian,
Sourabh Kulhare,
Rachel Millin,
Kenton W Gregory,
Cynthia R Gregory,
Meihua Zhu,
David O Kessler,
Laurie Malia,
Almaz Dessie,
Joni Rabiner,
Di Coneybeare,
Bo Shopsin,
Andrew Hersh,
Cristian Madar,
Jeffrey Shupp,
Laura S Johnson,
Jacob Avila,
Kristin Dwyer,
Peter Weimersheimer,
Balasundar Raju
, et al. (2 additional authors not shown)
Abstract:
Self-supervised learning (SSL) methods have shown promise for medical imaging applications by learning meaningful visual representations, even when the amount of labeled data is limited. Here, we extend state-of-the-art contrastive learning SSL methods to 2D+time medical ultrasound video data by introducing a modified encoder and augmentation method capable of learning meaningful spatio-temporal r…
▽ More
Self-supervised learning (SSL) methods have shown promise for medical imaging applications by learning meaningful visual representations, even when the amount of labeled data is limited. Here, we extend state-of-the-art contrastive learning SSL methods to 2D+time medical ultrasound video data by introducing a modified encoder and augmentation method capable of learning meaningful spatio-temporal representations, without requiring constraints on the input data. We evaluate our method on the challenging clinical task of identifying lung consolidations (an important pathological feature) in ultrasound videos. Using a multi-center dataset of over 27k lung ultrasound videos acquired from over 500 patients, we show that our method can significantly improve performance on downstream localization and classification of lung consolidation. Comparisons against baseline models trained without SSL show that the proposed methods are particularly advantageous when the size of labeled training data is limited (e.g., as little as 5% of the training set).
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
BrainVoxGen: Deep learning framework for synthesis of Ultrasound to MRI
Authors:
Shubham Singh,
Dr. Mrunal Bewoor,
Ammar Ranapurwala,
Satyam Rai,
Sheetal Patil
Abstract:
The study presents a deep learning framework aimed at synthesizing 3D MRI volumes from three-dimensional ultrasound images of the brain utilizing the Pix2Pix GAN model. The process involves inputting a 3D volume of ultrasounds into a UNET generator and patch discriminator, generating a corresponding 3D volume of MRI. Model performance was evaluated using losses on the discriminator and generator a…
▽ More
The study presents a deep learning framework aimed at synthesizing 3D MRI volumes from three-dimensional ultrasound images of the brain utilizing the Pix2Pix GAN model. The process involves inputting a 3D volume of ultrasounds into a UNET generator and patch discriminator, generating a corresponding 3D volume of MRI. Model performance was evaluated using losses on the discriminator and generator applied to a dataset of 3D ultrasound and MRI images. The results indicate that the synthesized MRI images exhibit some similarity to the expected outcomes. Despite challenges related to dataset size, computational resources, and technical complexities, the method successfully generated MRI volume with a satisfactory similarity score meant to serve as a baseline for further research. It underscores the potential of deep learning-based volume synthesis techniques for ultrasound to MRI conversion, showcasing their viability for medical applications. Further refinement and exploration are warranted for enhanced clinical relevance.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
MemGPT: Towards LLMs as Operating Systems
Authors:
Charles Packer,
Sarah Wooders,
Kevin Lin,
Vivian Fang,
Shishir G. Patil,
Ion Stoica,
Joseph E. Gonzalez
Abstract:
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appea…
▽ More
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers in order to effectively provide extended context within the LLM's limited context window, and utilizes interrupts to manage control flow between itself and the user. We evaluate our OS-inspired design in two domains where the limited context windows of modern LLMs severely handicaps their performance: document analysis, where MemGPT is able to analyze large documents that far exceed the underlying LLM's context window, and multi-session chat, where MemGPT can create conversational agents that remember, reflect, and evolve dynamically through long-term interactions with their users. We release MemGPT code and data for our experiments at https://memgpt.ai.
△ Less
Submitted 12 February, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Unsupervised deep learning framework for temperature-compensated damage assessment using ultrasonic guided waves on edge device
Authors:
Pankhi Kashyap,
Kajal Shivgan,
Sheetal Patil,
Ramana Raja B,
Sagar Mahajan,
Sauvik Banerjee,
Siddharth Tallur
Abstract:
Fueled by the rapid development of machine learning (ML) and greater access to cloud computing and graphics processing units (GPUs), various deep learning based models have been proposed for improving performance of ultrasonic guided wave structural health monitoring (GW-SHM) systems, especially to counter complexity and heterogeneity in data due to varying environmental factors (e.g., temperature…
▽ More
Fueled by the rapid development of machine learning (ML) and greater access to cloud computing and graphics processing units (GPUs), various deep learning based models have been proposed for improving performance of ultrasonic guided wave structural health monitoring (GW-SHM) systems, especially to counter complexity and heterogeneity in data due to varying environmental factors (e.g., temperature) and types of damages. Such models typically comprise of millions of trainable parameters, and therefore add to cost of deployment due to requirements of cloud connectivity and processing, thus limiting the scale of deployment of GW-SHM. In this work, we propose an alternative solution that leverages TinyML framework for development of light-weight ML models that could be directly deployed on embedded edge devices. The utility of our solution is illustrated by presenting an unsupervised learning framework for damage detection in honeycomb composite sandwich structure (HCSS) with disbond and delamination type of damages, validated using data generated by finite element (FE) simulations and experiments performed at various temperatures in the range 0°C to 90°C. We demonstrate a fully-integrated solution using a Xilinx Artix-7 FPGA for data acquisition and control, and edge-inference of damage.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
Experimental and numerical investigation to elucidate the fluid flow through packed beds with structured particle packings
Authors:
Shirin Patil,
Christian Gorges,
Joel López-Bonilla,
Moritz Stelter,
Frank Beyrau,
Berend van Wachem
Abstract:
The present paper presents an experimental and numerical investigation of the dispersion of the gaseous jet flow and co-flow for the simple unit cell (SUC) and body centered cubic (BCC) configuration of particles in packed beds. The experimental setup is built in such a way, that suitable and simplified boundary conditions are imposed for the corresponding numerical framework. The SUC and BCC part…
▽ More
The present paper presents an experimental and numerical investigation of the dispersion of the gaseous jet flow and co-flow for the simple unit cell (SUC) and body centered cubic (BCC) configuration of particles in packed beds. The experimental setup is built in such a way, that suitable and simplified boundary conditions are imposed for the corresponding numerical framework. The SUC and BCC particle beds consist of 3D-printed spheres. The flow velocities are analysed directly at the exit of the particle bed, for both beds for particle Reynolds numbers of 200, 300, and 400. Stereo particle image velocimetry (SPIV) is experimentally arranged in such a way, that the velocities over the entire region at the exit of the packed bed are obtained instantaneously. The numerical method consists of a state-of-the-art IBM with AMR. The paper presents the pore jet structure and velocity field exiting each pore for the SUC and BCC packed particle beds. The numerical and experimental studies show a good agreement for the SUC configuration for all flow velocities. For the BCC configuration, some differences can be observed in the pore jet flow structure between the simulations and the experiments, but the general flow velocity distribution shows a good overall agreement. The axial velocity is generally higher for the pores located near the centre of the packed bed than for the pores near the wall. In addition, the axial velocities are observed to increase near the peripheral pores of the packed bed. This behaviour is predominant for the BCC configuration as compared to the SUC configuration. It is shown that both the experiments and the simulations can be used to study the complex fluid structures inside a packed bed reactor.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
A Visual Analytic Environment to Co-locate Peoples' Tweets with City Factual Data
Authors:
Snehal Patil,
Shah Rukh Humayoun
Abstract:
Social Media platforms (e.g., Twitter, Facebook, etc.) are used heavily by public to provide news, opinions, and reactions towards events or topics. Integrating such data with the event or topic factual data could provide a more comprehensive understanding of the underlying event or topic. Targeting this, we present our visual analytics tool, called VC-FaT, that integrates peoples' tweet data rega…
▽ More
Social Media platforms (e.g., Twitter, Facebook, etc.) are used heavily by public to provide news, opinions, and reactions towards events or topics. Integrating such data with the event or topic factual data could provide a more comprehensive understanding of the underlying event or topic. Targeting this, we present our visual analytics tool, called VC-FaT, that integrates peoples' tweet data regarding crimes in San Francisco city with the city factual crime data. VC-FaT provides a number of interactive visualizations using both data sources for better understanding and exploration of crime activities happened in the city during a period of five years.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Systematic Review of Techniques in Brain Image Synthesis using Deep Learning
Authors:
Shubham Singh,
Ammar Ranapurwala,
Mrunal Bewoor,
Sheetal Patil,
Satyam Rai
Abstract:
This review paper delves into the present state of medical imaging, with a specific focus on the use of deep learning techniques for brain image synthesis. The need for medical image synthesis to improve diagnostic accuracy and decrease invasiveness in medical procedures is emphasized, along with the role of deep learning in enabling these advancements. The paper examines various methods and techn…
▽ More
This review paper delves into the present state of medical imaging, with a specific focus on the use of deep learning techniques for brain image synthesis. The need for medical image synthesis to improve diagnostic accuracy and decrease invasiveness in medical procedures is emphasized, along with the role of deep learning in enabling these advancements. The paper examines various methods and techniques for brain image synthesis, including 2D to 3D constructions, MRI synthesis, and the use of transformers. It also addresses limitations and challenges faced in these methods, such as obtaining well-curated training data and addressing brain ultrasound issues. The review concludes by exploring the future potential of this field and the opportunities for further advancements in medical imaging using deep learning techniques. The significance of transformers and their potential to revolutionize the medical imaging field is highlighted. Additionally, the paper discusses the potential solutions to the shortcomings and limitations faced in this field. The review provides researchers with an updated reference on the present state of the field and aims to inspire further research and bridge the gap between the present state of medical imaging and the future possibilities offered by deep learning techniques.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Disentangling the primordial nature of stochastic gravitational wave backgrounds with CMB spectral distortions
Authors:
Bryce Cyr,
Thomas Kite,
Jens Chluba,
J. Colin Hill,
Donghui Jeong,
Sandeep Kumar Acharya,
Boris Bolliet,
Subodh P. Patil
Abstract:
The recent detection of a stochastic gravitational wave background (SGWB) at nanohertz frequencies by pulsar timing arrays (PTAs) has sparked a flurry of interest. Beyond the standard interpretation that the progenitor is a network of supermassive black hole binaries, many exotic models have also been proposed, some of which can potentially offer a better fit to the data. We explore how the variou…
▽ More
The recent detection of a stochastic gravitational wave background (SGWB) at nanohertz frequencies by pulsar timing arrays (PTAs) has sparked a flurry of interest. Beyond the standard interpretation that the progenitor is a network of supermassive black hole binaries, many exotic models have also been proposed, some of which can potentially offer a better fit to the data. We explore how the various connections between gravitational waves and CMB spectral distortions can be leveraged to help determine whether a SGWB was generated primordially or astrophysically. To this end, we present updated $k$-space window functions which can be used for distortion parameter estimation on enhancements to the primordial scalar power spectrum. These same enhancements can also source gravitational waves (GWs) directly at second order in perturbation theory, so-called scalar-induced GWs (SIGWs), and indirectly through the formation of primordial black holes (PBHs). We perform a map** of scalar power spectrum constraints into limits on the GW parameter space of SIGWs for $δ$-function features. We highlight that broader features in the scalar spectrum can explain the PTA results while simultaneously producing a spectral distortion (SD) within reach of future experiments. We additionally update PBH constraints from $μ$- and $y$-type spectral distortions. Refined treatments of the distortion window functions widen existing SD constraints, and we find that a future CMB spectrometer could play a pivotal role in unraveling the origin of GWs imprinted at or below CMB anisotropy scales.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Understanding the soil water dynamics during excess and deficit rainfall conditions over the Core monsoon zone of India
Authors:
Mangesh M. Goswami,
Milind Mujumdar,
Bhupendra Bahadur Singh,
Madhusudan Ingale,
Naresh Ganeshi,
Manish Ranalkar,
Trenton E. Franz,
Prashant Srivastav,
Dev Niyogi,
R. Krishnan,
S. N. Patil
Abstract:
Observations of soil moisture (SM) during excess and deficit monsoon seasons between 2000 to 2021 present a unique opportunity to understand the soil water dynamics (SWD) over core monsoon zone (CMZ) of India. This study aims to analyse SWD by investigating the SM variability, SM memory (SMM), and the coupling between the surface and subsurface SM levels. Particularly intriguing are instances of c…
▽ More
Observations of soil moisture (SM) during excess and deficit monsoon seasons between 2000 to 2021 present a unique opportunity to understand the soil water dynamics (SWD) over core monsoon zone (CMZ) of India. This study aims to analyse SWD by investigating the SM variability, SM memory (SMM), and the coupling between the surface and subsurface SM levels. Particularly intriguing are instances of concurrent monsoonal extremes, which give rise to complex SWD patterns. Usually, it is noted that a depleted convective activity and persistence of higher temperatures during the pre-monsoon season leads to lower SM, while monsoon rains and post-monsoon showers support the prevalence of higher SM conditions. The long persistent dry spells during deficit monsoon years enhances the Bowen ratio (BR) due to the high sensible heat fluxes. On the other hand, the availability of large latent heat flux during excess monsoon and post-monsoon seasons tends to decrease the BR. This enhancement or reduction in BR is due to evapotranspiration (ET), which influences the SWD by modulating the surface subsurface SM coupling. The surface and subsurface SM coupling analysis for CMZ exhibits significant distinction in the evolution of wet and dry extremes. SM variations and persistence time scale is used as an indicator of SMM, and analysed for both surface and subsurface SM observation levels. Evidently, subsurface SM exhibits remarkably prolonged memory timescales, approximately twice that of surface SM. Furthermore, we dissect SWD linked to wet and dry extremes by analysing annual soil water balance (SWB). Our findings reveal augmented (diminished) ET during deficit (excess) years, subjected to a higher (lower) number of break events. In essence, our study underscores the significance of surface-subsurface SM observations in unravelling the intricate tapestry of SWD.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Hydrogen bonding exchange and supramolecular dynamics of monohydroxy alcohols
Authors:
Shinian Cheng,
Shalin Patil,
Shiwang Cheng
Abstract:
This Letter unravels hydrogen bonding dynamics and their relationship with supramolecular relaxations of monohydroxy alcohols (MAs) at intermediate times. Rheological modulus of MAs exhibit Rouse scaling relaxation of G(t) ~ t^(-1/2) switching to G(t) ~ t^(-1) at time tau_m before their terminal time. Meanwhile, dielectric spectroscopy reveals clear signatures of new supramolecular dynamics matchi…
▽ More
This Letter unravels hydrogen bonding dynamics and their relationship with supramolecular relaxations of monohydroxy alcohols (MAs) at intermediate times. Rheological modulus of MAs exhibit Rouse scaling relaxation of G(t) ~ t^(-1/2) switching to G(t) ~ t^(-1) at time tau_m before their terminal time. Meanwhile, dielectric spectroscopy reveals clear signatures of new supramolecular dynamics matching with tau_m from rheology. Interestingly, the characteristic time, tau_m, follows an Arrhenius-like temperature dependence over exceptionally wide temperatures and agrees well with the hydrogen bonding exchange time from nuclear magnetic resonance measurements. These observations demonstrate the presence of collective Rouse-like sub-chain motions and the active chain-swap** of MAs at intermediate times. Moreover, detailed theoretical analyses point out explicitly that the hydrogen bonding exchange truncates the Rouse-type supramolecular dynamics and triggers the chain-swap** processes, supporting a recently proposed living polymer model.
△ Less
Submitted 28 August, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
A proposal for detecting the spin of a single electron in superfluid helium
Authors:
**yong Ma,
Y. S. S. Patil,
Jiaxin Yu,
Yiqi Wang,
J. G. E. Harris
Abstract:
The electron bubble in superfluid helium has two degrees of freedom that may offer exceptionally low dissipation: the electron's spin and the bubble's motion. If these degrees of freedom can be read out and controlled with sufficient sensitivity, they would provide a novel platform for realizing a range of quantum technologies and for exploring open questions in the physics of superfluid helium. H…
▽ More
The electron bubble in superfluid helium has two degrees of freedom that may offer exceptionally low dissipation: the electron's spin and the bubble's motion. If these degrees of freedom can be read out and controlled with sufficient sensitivity, they would provide a novel platform for realizing a range of quantum technologies and for exploring open questions in the physics of superfluid helium. Here we propose a practical scheme for accomplishing this by trap** an electron bubble inside a superfluid-filled opto-acoustic cavity.
△ Less
Submitted 17 June, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Weakly Semi-Supervised Detection in Lung Ultrasound Videos
Authors:
Jiahong Ouyang,
Li Chen,
Gary Y. Li,
Naveen Balaraju,
Shubham Patil,
Courosh Mehanian,
Sourabh Kulhare,
Rachel Millin,
Kenton W. Gregory,
Cynthia R. Gregory,
Meihua Zhu,
David O. Kessler,
Laurie Malia,
Almaz Dessie,
Joni Rabiner,
Di Coneybeare,
Bo Shopsin,
Andrew Hersh,
Cristian Madar,
Jeffrey Shupp,
Laura S. Johnson,
Jacob Avila,
Kristin Dwyer,
Peter Weimersheimer,
Balasundar Raju
, et al. (2 additional authors not shown)
Abstract:
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student train…
▽ More
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student training strategy to provide additional supervision via a video-level loss. We also introduce improvements to the underlying teacher-student framework, including methods to improve the quality of pseudo-labels based on weak supervision and adaptive schemes to optimize knowledge transfer between the student and teacher networks. We apply this approach to the clinically important task of detecting lung consolidations (seen in respiratory infections such as COVID-19 pneumonia) in medical ultrasound videos. Experiments reveal that our framework improves detection accuracy and robustness compared to baseline semi-supervised models, and improves efficiency in data and annotation usage.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Spectral properties of a mixed singlet-triplet Ising superconductor
Authors:
Sourabh Patil,
Gaomin Tang,
Wolfgang Belzig
Abstract:
Conventional two-dimensional superconductivity is destroyed when the critical in-plane magnetic field exceeds the so-called Pauli limit. Some monolayer transition-metal dichalcogenides lack inversion symmetry and the strong spin-orbit coupling leads to a valley-dependent Zeeman-like spin splitting. The resulting spin-valley locking lifts the valley degeneracy and results in a strong enhancement of…
▽ More
Conventional two-dimensional superconductivity is destroyed when the critical in-plane magnetic field exceeds the so-called Pauli limit. Some monolayer transition-metal dichalcogenides lack inversion symmetry and the strong spin-orbit coupling leads to a valley-dependent Zeeman-like spin splitting. The resulting spin-valley locking lifts the valley degeneracy and results in a strong enhancement of the in-plane critical magnetic field. In these systems, it was predicted that the density of states in an in-plane field exhibits distinct mirage gaps at finite energies of about the spin-orbit coupling strength, which arise from a coupling of the electron and hole bands at energy larger than the superconducting gap. In this study, we investigate the impact of a triplet pairing channel on the spectral properties, primarily the mirage gap and the superconducting gap, in the clean limit. Notably, in the presence of the triplet pairing channel, the mirage-gap width is reduced for the low magnetic fields. Furthermore, when the temperature is lower than the triplet critical temperature, the mirage gaps survive even in the strong-field limit due to the finite singlet and triplet order parameters. Our work provides insights into controlling and understanding the properties of spin-triplet Cooper pairs.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Use of Non-Maximal entangled state for free space BBM92 quantum key distribution protocol
Authors:
Ayan Biswas,
Sarika Mishra,
Satyajeet Patil,
Anindya Banerji,
Shashi Prabhakar,
Ravindra P. Singh
Abstract:
Satellite-based quantum communication for secure key distribution is becoming a more demanding field of research due to its unbreakable security. Prepare and measure protocols such as BB84 consider the satellite as a trusted device, fraught with danger looking at the current trend for satellite-based optical communication. Therefore, entanglement-based protocols must be preferred since, along with…
▽ More
Satellite-based quantum communication for secure key distribution is becoming a more demanding field of research due to its unbreakable security. Prepare and measure protocols such as BB84 consider the satellite as a trusted device, fraught with danger looking at the current trend for satellite-based optical communication. Therefore, entanglement-based protocols must be preferred since, along with overcoming the distance limitation, one can consider the satellite as an untrusted device too. E91 protocol is a good candidate for satellite-based quantum communication; but the key rate is low as most of the measured qubits are utilized to verify a Bell-CHSH inequality to ensure security against Eve. An entanglement-based protocol requires a maximally entangled state for more secure key distribution. The current work discusses the effect of non-maximality on secure key distribution. It establishes a lower bound on the non-maximality condition below which no secure key can be extracted. BBM92 protocol will be more beneficial for key distribution as we found a linear connection between the extent of violation for Bell-CHSH inequality and the quantum bit error rate for a given setup.
△ Less
Submitted 6 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Enhancing Dexterity in Robotic Manipulation via Hierarchical Contact Exploration
Authors:
Xianyi Cheng,
Sarvesh Patil,
Zeynep Temel,
Oliver Kroemer,
Matthew T. Mason
Abstract:
Planning robot dexterity is challenging due to the non-smoothness introduced by contacts, intricate fine motions, and ever-changing scenarios. We present a hierarchical planning framework for dexterous robotic manipulation (HiDex). This framework explores in-hand and extrinsic dexterity by leveraging contacts. It generates rigid-body motions and complex contact sequences. Our framework is based on…
▽ More
Planning robot dexterity is challenging due to the non-smoothness introduced by contacts, intricate fine motions, and ever-changing scenarios. We present a hierarchical planning framework for dexterous robotic manipulation (HiDex). This framework explores in-hand and extrinsic dexterity by leveraging contacts. It generates rigid-body motions and complex contact sequences. Our framework is based on Monte-Carlo Tree Search and has three levels: 1) planning object motions and environment contact modes; 2) planning robot contacts; 3) path evaluation and control optimization. This framework offers two main advantages. First, it allows efficient global reasoning over high-dimensional complex space created by contacts. It solves a diverse set of manipulation tasks that require dexterity, both intrinsic (using the fingers) and extrinsic (also using the environment), mostly in seconds. Second, our framework allows the incorporation of expert knowledge and customizable setups in task mechanics and models. It requires minor modifications to accommodate different scenarios and robots. Hence, it provides a flexible and generalizable solution for various manipulation tasks. As examples, we analyze the results on 7 hand configurations and 15 scenarios. We demonstrate 8 tasks on two robot platforms.
△ Less
Submitted 8 November, 2023; v1 submitted 1 July, 2023;
originally announced July 2023.
-
Hyena Neural Operator for Partial Differential Equations
Authors:
Saurabh Patil,
Zijie Li,
Amir Barati Farimani
Abstract:
Numerically solving partial differential equations typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving partial differential equations that involves the use of neural operators. Neural operators are neural network architectures that learn map**s between f…
▽ More
Numerically solving partial differential equations typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving partial differential equations that involves the use of neural operators. Neural operators are neural network architectures that learn map**s between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys a global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different partial differential equations instances. To measure how effective the layers are in solving partial differential equations, we conduct experiments on Diffusion-Reaction equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning partial differential equations solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator
△ Less
Submitted 20 September, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Process Voltage Temperature Variability Estimation of Tunneling Current for Band-to-Band-Tunneling based Neuron
Authors:
Shubham Patil,
Anand Sharma,
Gaurav R,
Abhishek Kadam,
Ajay Kumar Singh,
Sandip Lashkare,
Nihar Ranjan Mohapatra,
Udayan Ganguly
Abstract:
Compact and energy-efficient Synapse and Neurons are essential to realize the full potential of neuromorphic computing. In addition, a low variability is indeed needed for neurons in Deep neural networks for higher accuracy. Further, process (P), voltage (V), and temperature (T) variation (PVT) are essential considerations for low-power circuits as performance impact and compensation complexities…
▽ More
Compact and energy-efficient Synapse and Neurons are essential to realize the full potential of neuromorphic computing. In addition, a low variability is indeed needed for neurons in Deep neural networks for higher accuracy. Further, process (P), voltage (V), and temperature (T) variation (PVT) are essential considerations for low-power circuits as performance impact and compensation complexities are added costs. Recently, band-to-band tunneling (BTBT) neuron has been demonstrated to operate successfully in a network to enable a Liquid State Machine. A comparison of the PVT with competing modes of operation (e.g., BTBT vs. sub-threshold and above threshold) of the same transistor is a critical factor in assessing performance. In this work, we demonstrate the PVT variation impact in the BTBT regime and benchmark the operation against the subthreshold slope (SS) and ON-regime (ION) of partially depleted-Silicon on Insulator MOSFET. It is shown that the On-state regime offers the lowest variability but dissipates higher power. Hence, not usable for low-power sources. Among the BTBT and SS regimes, which can enable the low-power neuron, the BTBT regime has shown ~3x variability reduction (σ_I_D/μ_I_D) than the SS regime, considering the cumulative PVT variability. The improvement is due to the well-known weaker P, V, and T dependence of BTBT vs. SS. We show that the BTBT variation is uncorrelated with mutually correlated SS & ION operation - indicating its different origin from the mechanism and location perspectives. Hence, the BTBT regime is promising for low-current, low-power, and low device-to-device variability neuron operation.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance
Authors:
Supriya Gadi Patil,
Angel X. Chang,
Manolis Savva
Abstract:
This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a database of 3D indoor scenes, where objects occur in different orientations. We benchmark different methods for feature extraction and classification in the context…
▽ More
This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a database of 3D indoor scenes, where objects occur in different orientations. We benchmark different methods for feature extraction and classification in the context of this task. We systematically contrast different choices in a variety of experimental settings investigating the impact on the performance of different rotation distributions, different degrees of partial observations on the object, and the different levels of difficulty of negative pairs. Our study, on a synthetic dataset of 3D scenes where objects instances occur in different orientations, reveals that deep learning-based rotation invariant methods are effective for relatively easy settings with easy-to-distinguish pairs. However, their performance decreases significantly when the difference in rotations on the input pair is large, or when the degree of observation of input objects is reduced, or the difficulty level of input pair is increased. Finally, we connect feature encodings designed for rotation-invariant methods to 3D geometry that enable them to acquire the property of rotation invariance.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Neural Sculpting: Uncovering hierarchically modular task structure in neural networks through pruning and network analysis
Authors:
Shreyas Malakarjun Patil,
Loizos Michael,
Constantine Dovrolis
Abstract:
Natural target functions and tasks typically exhibit hierarchical modularity -- they can be broken down into simpler sub-functions that are organized in a hierarchy. Such sub-functions have two important features: they have a distinct set of inputs (input-separability) and they are reused as inputs higher in the hierarchy (reusability). Previous studies have established that hierarchically modular…
▽ More
Natural target functions and tasks typically exhibit hierarchical modularity -- they can be broken down into simpler sub-functions that are organized in a hierarchy. Such sub-functions have two important features: they have a distinct set of inputs (input-separability) and they are reused as inputs higher in the hierarchy (reusability). Previous studies have established that hierarchically modular neural networks, which are inherently sparse, offer benefits such as learning efficiency, generalization, multi-task learning, and transfer. However, identifying the underlying sub-functions and their hierarchical structure for a given task can be challenging. The high-level question in this work is: if we learn a task using a sufficiently deep neural network, how can we uncover the underlying hierarchy of sub-functions in that task? As a starting point, we examine the domain of Boolean functions, where it is easier to determine whether a task is hierarchically modular. We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference. Finally, we demonstrate that this method can uncover the hierarchical modularity of a wide range of Boolean functions and two vision tasks based on the MNIST digits dataset.
△ Less
Submitted 27 October, 2023; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Gorilla: Large Language Model Connected with Massive APIs
Authors:
Shishir G. Patil,
Tianjun Zhang,
Xin Wang,
Joseph E. Gonzalez
Abstract:
Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate…
▽ More
Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at https://gorilla.cs.berkeley.edu
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Evolution of ferroelectricity with annealing temperature and thickness in sputter deposited undoped HfO$_2$ on silicon
Authors:
Md Hanif Ali,
Adityanarayan Pandey,
Rowtu Srinu,
Paritosh Meihar,
Shubham Patil,
Sandip Lashkare,
Udayan Ganguly
Abstract:
Ferroelectricity in sputtered undoped-HfO$_2$ is attractive for composition control for low power and non-volatile memory and logic applications. Unlike doped HfO$_2$, evolution of ferroelectricity with annealing and film thickness effect in sputter deposited undoped HfO$_2$ on Si is not yet reported. In present study, we have demonstrated the impact of post metallization annealing temperature and…
▽ More
Ferroelectricity in sputtered undoped-HfO$_2$ is attractive for composition control for low power and non-volatile memory and logic applications. Unlike doped HfO$_2$, evolution of ferroelectricity with annealing and film thickness effect in sputter deposited undoped HfO$_2$ on Si is not yet reported. In present study, we have demonstrated the impact of post metallization annealing temperature and film thickness on ferroelectric properties in dopant-free sputtered HfO$_2$ on Si-substrate. A rich correlation of polarization with phase, lattice constant, and crystallite size and interface reaction is observed. First, anneal temperature shows o-phase saturation beyond 600 oC followed by interface reaction beyond 700 oC to show an optimal temperature window on 600-700 oC. Second, thickness study at the optimal temperature window shows an alluring o-phase crystallite scaling with thickness till a critical thickness of 20 nm indicating that the films are completely o-phase. However, the lattice constants (volume) are high in the 15-20 nm thickness range which correlates with the enhanced value of 2Pr. Beyond 20 nm, crystallite scaling with thickness saturates with the correlated appearance of m-phase and reduction in 2Pr. The optimal thickness-temperature window range of 15-20 nm films annealed at 600-700 oC show 2Pr of ~35.5 micro-C/cm$^2$ is comparable to state-of-the-art. The robust wakeup-free endurance of ~$10^$8 cycles showcased in the promising temperature-thickness window has been identified systematically for non-volatile memory applications.
△ Less
Submitted 25 April, 2023;
originally announced April 2023.
-
Schottky Barrier MOSFET Enabled Ultra-Low Power Real-Time Neuron for Neuromorphic Computing
Authors:
Shubham Patil,
Jayatika Sakhuja,
Ajay Kumar Singh,
Anmol Biswas,
Vivek Saraswat,
Sandeep Kumar,
Sandip Lashkare,
Udayan Ganguly
Abstract:
Energy-efficient real-time synapses and neurons are essential to enable large-scale neuromorphic computing. In this paper, we propose and demonstrate the Schottky-Barrier MOSFET-based ultra-low power voltage-controlled current source to enable real-time neurons for neuromorphic computing. Schottky-Barrier MOSFET is fabricated on a Silicon-on-insulator platform with polycrystalline Silicon as the c…
▽ More
Energy-efficient real-time synapses and neurons are essential to enable large-scale neuromorphic computing. In this paper, we propose and demonstrate the Schottky-Barrier MOSFET-based ultra-low power voltage-controlled current source to enable real-time neurons for neuromorphic computing. Schottky-Barrier MOSFET is fabricated on a Silicon-on-insulator platform with polycrystalline Silicon as the channel and Nickel/Platinum as the source/drain. The Poly-Si and Nickel make the back-to-back Schottky junction enabling ultra-low ON current required for energy-efficient neurons.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.