Search | arXiv e-print repository

Intent-Based Access Control: Using LLMs to Intelligently Manage Access Control

Authors: Pranav Subramaniam, Sanjay Krishnan

Abstract: In every enterprise database, administrators must define an access control policy that specifies which users have access to which assets. Access control straddles two worlds: policy (organization-level principles that define who should have access) and process (database-level primitives that actually implement the policy). Assessing and enforcing process compliance with a policy is a manual and ad… ▽ More In every enterprise database, administrators must define an access control policy that specifies which users have access to which assets. Access control straddles two worlds: policy (organization-level principles that define who should have access) and process (database-level primitives that actually implement the policy). Assessing and enforcing process compliance with a policy is a manual and ad-hoc task. This paper introduces a new paradigm for access control called Intent-Based Access Control for Databases (IBAC-DB). In IBAC-DB, access control policies are expressed more precisely using a novel format, the natural language access control matrix (NLACM). Database access control primitives are synthesized automatically from these NLACMs. These primitives can be used to generate new DB configurations and/or evaluate existing ones. This paper presents a reference architecture for an IBAC-DB interface, an initial implementation for PostgreSQL (which we call LLM4AC), and initial benchmarks that evaluate the accuracy and scope of such a system. We find that our chosen implementation, LLM4AC, vastly outperforms other baselines, achieving near-perfect F1 scores on our initial benchmarks. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 13 pages, 21 figures, 1 table

arXiv:2305.10419 [pdf, other]

Kitana: Efficient Data Augmentation Search for AutoML

Authors: Zezhou Huang, Pranav Subramaniam, Raul Castro Fernandez, Eugene Wu

Abstract: AutoML services provide a way for non-expert users to benefit from high-quality ML models without worrying about model design and deployment, in exchange for a charge per hour ($21.252 for VertexAI). However, existing AutoML services are model-centric, in that they are limited to extracting features and searching for models from initial training data-they are only as effective as the initial train… ▽ More AutoML services provide a way for non-expert users to benefit from high-quality ML models without worrying about model design and deployment, in exchange for a charge per hour ($21.252 for VertexAI). However, existing AutoML services are model-centric, in that they are limited to extracting features and searching for models from initial training data-they are only as effective as the initial training data quality. With the increasing volume of tabular data available, there is a huge opportunity for data augmentation. For instance, vertical augmentation adds predictive features, while horizontal augmentation adds examples. This augmented training data yields potentially much better AutoML models at a lower cost. However, existing systems either forgo the augmentation opportunities that provide poor models, or apply expensive augmentation searching techniques that drain users' budgets. Kitana is a data-centric AutoML system that also searches for new tabular datasets that can augment the tabular training data with new features and/or examples. Kitana manages a corpus of datasets, exposes an AutoML interface to users and searches for augmentation with datasets in the corpus to improve AutoML performance. To accelerate search, Kitana applies aggressive pre-computation to train a factorized proxy model and evaluate each candidate augmentation within 0.1s. Kitana also uses a cost model to limit the time spent on augmentation search, supports expressive data access controls, and performs request caching to benefit from past similar requests. Using a corpus of 518 open-source datasets, Kitana produces higher quality models than existing AutoML systems in orders of magnitude less time. Across different user requests, Kitana increases the model R2 from 0.16 to 0.66 while reducing the cost by >100x compared to the naive factorized learning and SOTA data augmentation search. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2109.09067 [pdf, other]

Resonance frequency of different interfacial modes and steady streaming by a slug trapped at one end of a millichannel

Authors: Shambhu Anil, Pushpavanam Subramaniam

Abstract: Active micropum** and micromixing using oscillating bubbles form the basis for various Lab-on-chip applications. Acoustically excited oscillatory bubbles are commonly used in active particle sorting, micropum**, micromixing, ultrasonic imaging, cell lysis and rotation. For efficient micromixing, the system must be operated at its resonant frequency where amplitude of oscillation is maximum. Th… ▽ More Active micropum** and micromixing using oscillating bubbles form the basis for various Lab-on-chip applications. Acoustically excited oscillatory bubbles are commonly used in active particle sorting, micropum**, micromixing, ultrasonic imaging, cell lysis and rotation. For efficient micromixing, the system must be operated at its resonant frequency where amplitude of oscillation is maximum. This ensures that high-intensity cavitation microstreaming is generated. In this work, we determine the resonant frequencies for the different surface modes of oscillation of a rectangular gas slug confined at one end of a millichannel using perturbation techniques and matched asymptotic expansions. We explicitly specify the oscillation frequency of the interface and compute the surface mode amplitudes from the interface deformation. This oscillatory flow field at the leading order is also determined. The effect of aspect ratio of gas slug on observable streaming is analysed. The predictions of surface modes from perturbation theory are validated with simulations of the system done in ANSYS Fluent. △ Less

Submitted 6 July, 2023; v1 submitted 19 September, 2021; originally announced September 2021.

Comments: Submitted for review in Physical Review Fluids

arXiv:2107.05606 [pdf, other]

doi 10.1103/PhysRevC.104.L042801

First direct measurement of $^{59}$Cu(p,$α$)$^{56}$Ni: A step towards constraining the Ni-Cu cycle in the Cosmos

Authors: J. S. Randhawa, R. Kanungo, J. Refsgaard, P. Mohr, T. Ahn, M. Alcorta, C. Andreoiu, S. S. Bhattacharjee, B. Davids, G. Christian, A. A. Chen, R. Coleman, P. Garrett, G. F. Grinyer, E. Gyabeng Fuakye, G. Hackman, R. Jain, K. Kapoor, R. Krücken, A. Laffoley, A. Lennarz, J. Liang, Z. Meisel, N. Nikhil, A. Psaltis , et al. (12 additional authors not shown)

Abstract: Reactions on the proton-rich nuclides drive the nucleosynthesis in Core-Collapse Supernovae (CCSNe) and in X-ray bursts (XRBs). CCSNe eject the nucleosynthesis products to the interstellar medium and hence are a potential inventory of p-nuclei, whereas in XRBs nucleosynthesis powers the light curves. In both astrophysical sites the Ni-Cu cycle, which features a competition between $^{59}$Cu(p,$α$)… ▽ More Reactions on the proton-rich nuclides drive the nucleosynthesis in Core-Collapse Supernovae (CCSNe) and in X-ray bursts (XRBs). CCSNe eject the nucleosynthesis products to the interstellar medium and hence are a potential inventory of p-nuclei, whereas in XRBs nucleosynthesis powers the light curves. In both astrophysical sites the Ni-Cu cycle, which features a competition between $^{59}$Cu(p,$α$)$^{56}$Ni and $^{59}$Cu(p,$γ$)$^{60}$Zn, could potentially halt the production of heavier elements. Here, we report the first direct measurement of $^{59}$Cu(p,$α$)$^{56}$Ni using a re-accelerated $^{59}$Cu beam and cryogenic solid hydrogen target. Our results show that the reaction proceeds predominantly to the ground state of $^{56}$Ni and the experimental rate has been found to be lower than Hauser-Feshbach-based statistical predictions. New results hint that the $νp$-process could operate at higher temperatures than previously inferred and therefore remains a viable site for synthesizing the heavier elements. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2103.07532 [pdf, other]

Comprehensive and Comprehensible Data Catalogs: The What, Who, Where, When, Why, and How of Metadata Management

Authors: Pranav Subramaniam, Yintong Ma, Chi Li, Ipsita Mohanty, Raul Castro Fernandez

Abstract: Data management tasks require access to metadata, which is increasingly tracked by databases called data catalogs. Current catalogs are too dependent on users' understanding of data, leading to difficulties in large organizations of users with different skills: catalogs either make metadata easy for users to store and difficult to retrieve, or they make it easy to retrieve, but difficult to store.… ▽ More Data management tasks require access to metadata, which is increasingly tracked by databases called data catalogs. Current catalogs are too dependent on users' understanding of data, leading to difficulties in large organizations of users with different skills: catalogs either make metadata easy for users to store and difficult to retrieve, or they make it easy to retrieve, but difficult to store. In this paper, we present 5W1H+R, a new catalog mental model that is comprehensive in the metadata it represents, and comprehensible in that it permits all users to locate metadata easily. We demonstrate these properties via a user study. We then discuss practical guidelines for implementing the new mental model. We conclude mental models are important to make data catalogs more useful and to boost metadata management efforts. △ Less

Submitted 1 February, 2023; v1 submitted 12 March, 2021; originally announced March 2021.

Comments: 14 pages, 8 figures, 8 tables

arXiv:2009.04227 [pdf, other]

Anonymization of labeled TOF-MRA images for brain vessel segmentation using generative adversarial networks

Authors: Tabea Kossen, Pooja Subramaniam, Vince I. Madai, Anja Hennemuth, Kristian Hildebrand, Adam Hilbert, Jan Sobesky, Michelle Livne, Ivana Galinovic, Ahmed A. Khalil, Jochen B. Fiebach, Dietmar Frey

Abstract: Anonymization and data sharing are crucial for privacy protection and acquisition of large datasets for medical image analysis. This is a big challenge, especially for neuroimaging. Here, the brain's unique structure allows for re-identification and thus requires non-conventional anonymization. Generative adversarial networks (GANs) have the potential to provide anonymous images while preserving p… ▽ More Anonymization and data sharing are crucial for privacy protection and acquisition of large datasets for medical image analysis. This is a big challenge, especially for neuroimaging. Here, the brain's unique structure allows for re-identification and thus requires non-conventional anonymization. Generative adversarial networks (GANs) have the potential to provide anonymous images while preserving predictive properties. Analyzing brain vessel segmentation, we trained 3 GANs on time-of-flight (TOF) magnetic resonance angiography (MRA) patches for image-label generation: 1) Deep convolutional GAN, 2) Wasserstein-GAN with gradient penalty (WGAN-GP) and 3) WGAN-GP with spectral normalization (WGAN-GP-SN). The generated image-labels from each GAN were used to train a U-net for segmentation and tested on real data. Moreover, we applied our synthetic patches using transfer learning on a second dataset. For an increasing number of up to 15 patients we evaluated the model performance on real data with and without pre-training. The performance for all models was assessed by the Dice Similarity Coefficient (DSC) and the 95th percentile of the Hausdorff Distance (95HD). Comparing the 3 GANs, the U-net trained on synthetic data generated by the WGAN-GP-SN showed the highest performance to predict vessels (DSC/95HD 0.82/28.97) benchmarked by the U-net trained on real data (0.89/26.61). The transfer learning approach showed superior performance for the same GAN compared to no pre-training, especially for one patient only (0.91/25.68 vs. 0.85/27.36). In this work, synthetic image-label pairs retained generalizable information and showed good performance for vessel segmentation. Besides, we showed that synthetic patches can be used in a transfer learning approach with independent data. This paves the way to overcome the challenges of scarce data and anonymization in medical imaging. △ Less

Submitted 16 November, 2020; v1 submitted 9 September, 2020; originally announced September 2020.

Comments: 9 pages, 4 figures

arXiv:2002.01047 [pdf, other]

Data Market Platforms: Trading Data Assets to Solve Data Problems

Authors: Raul Castro Fernandez, Pranav Subramaniam, Michael J. Franklin

Abstract: Data only generates value for a few organizations with expertise and resources to make data shareable, discoverable, and easy to integrate. Sharing data that is easy to discover and integrate is hard because data owners lack information (who needs what data) and they do not have incentives to prepare the data in a way that is easy to consume by others. In this paper, we propose data market platf… ▽ More Data only generates value for a few organizations with expertise and resources to make data shareable, discoverable, and easy to integrate. Sharing data that is easy to discover and integrate is hard because data owners lack information (who needs what data) and they do not have incentives to prepare the data in a way that is easy to consume by others. In this paper, we propose data market platforms to address the lack of information and incentives and tackle the problems of data sharing, discovery, and integration. In a data market platform, data owners want to share data because they will be rewarded if they do so. Consumers are encouraged to share their data needs because the market will solve the discovery and integration problem for them in exchange for some form of currency. We consider internal markets that operate within organizations to bring down data silos, as well as external markets that operate across organizations to increase the value of data for everybody. We outline a research agenda that revolves around two problems. The problem of market design, or how to design rules that lead to the outcomes we want, and the systems problem, how to implement the market and enforce the rules. Treating data as a first-class asset is sorely needed to extend the value of data to more organizations, and we propose data market platforms as one mechanism to achieve this goal. △ Less

Submitted 1 July, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

arXiv:1312.5551 [pdf]

Enriched Performance on Wireless Sensor Network using Fuzzy based Clustering Technique

Authors: A. M Nirmala, P. Subramaniam, A. Anusha priya, M. Ravi

Abstract: The wireless sensor networks combines sensing, computation, and communication into a single small device. These devices depend on battery power and may be placed in hostile environments replacing them becomes a tedious task. Thus improving the energy of these networks becomes important. Clustering in wireless sensor network looks several challenges such as selection of an optimal group of sensor n… ▽ More The wireless sensor networks combines sensing, computation, and communication into a single small device. These devices depend on battery power and may be placed in hostile environments replacing them becomes a tedious task. Thus improving the energy of these networks becomes important. Clustering in wireless sensor network looks several challenges such as selection of an optimal group of sensor nodes as cluster, optimum selection of cluster head, energy balanced optimal strategy for rotating the role of cluster head in a cluster, maintaining intra and inter cluster connectivity and optimal data routing in the network. In this paper, we study a protocol supporting an energy efficient clustering, cluster head selection and data routing method to extend the lifetime of sensor network. Simulation results demonstrate that the proposed protocol prolongs network lifetime due to the use of efficient clustering, cluster head selection and data routing. The results of simulation show that at the end of some certain part of running the EECS and Fuzzy based clustering algorithm increases the number of alive nodes comparing with the LEACH and HEED methods and this can lead to an increase in sensor network lifetime. By using the EECS method the total number of messages received at base station is increased when compared with LEACH and HEED methods. The Fuzzy based clustering method compared with the K-Means Clustering by means of iteration count and time taken to die first node in wireless sensor network, as the result shows that the fuzzy based clustering method perform well than kmeans clustering methods. △ Less

Submitted 19 December, 2013; originally announced December 2013.

Comments: In this paper fuzzy logic concept hybrid with wireless sensor networking and include 5 figures and 5 tables

Showing 1–8 of 8 results for author: Subramaniam, P