Skip to main content

Showing 1–5 of 5 results for author: Borkar, J

.
  1. arXiv:2407.00250  [pdf, other

    cs.CV cs.CL

    Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription

    Authors: Jaydeep Borkar, David A. Smith

    Abstract: Historical documents frequently suffer from damage and inconsistencies, including missing or illegible text resulting from issues such as holes, ink problems, and storage damage. These missing portions or gaps are referred to as lacunae. In this study, we employ transformer-based optical character recognition (OCR) models trained on synthetic data containing lacunae in a supervised manner. We demo… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Accepted to ICDAR 2024 Workshop on Computational Paleography

  2. arXiv:2406.17746  [pdf, other

    cs.CL cs.AI

    Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

    Authors: USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra

    Abstract: Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, recons… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2307.10476  [pdf, other

    cs.CR cs.CL

    What can we learn from Data Leakage and Unlearning for Law?

    Authors: Jaydeep Borkar

    Abstract: Large Language Models (LLMs) have a privacy concern because they memorize training data (including personally identifiable information (PII) like emails and phone numbers) and leak it during inference. A company can train an LLM on its domain-customized data which can potentially also include their users' PII. In order to comply with privacy laws such as the "right to be forgotten", the data point… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 5 pages, 8 figures, accepted to the first GenLaw workshop at ICML'23, Hawai'i

  4. arXiv:2105.09685  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Simple Transparent Adversarial Examples

    Authors: Jaydeep Borkar, Pin-Yu Chen

    Abstract: There has been a rise in the use of Machine Learning as a Service (MLaaS) Vision APIs as they offer multiple services including pre-built models and algorithms, which otherwise take a huge amount of resources if built from scratch. As these APIs get deployed for high-stakes applications, it's very important that they are robust to different manipulations. Recent works have only focused on typical… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    Comments: 14 pages, 9 figures, Published at ICLR 2021 Workshop on Security and Safety in Machine Learning Systems

  5. arXiv:1806.00284  [pdf, ps, other

    astro-ph.HE

    The Multifrequency Behavior of Sagittarius A*

    Authors: A. Eckart, M. Zajacek, M. Parsa, E. Hosseini N. Fazeli, G. Busch, B. Shahzamanian, M. Subroweit, F. Peissker, N. Sabha, M. Valencia-S., M. Horrobin, C. Straubmeier, S. Rost, J. Schneeloch A. Borkar, V. Karas, S. Britzen, A. Zensus, F. Kamali

    Abstract: The Galactic Center is the closest galactic nucleus that allows us to determine the multi-frequency behavior of the supermassive black hole counterpart Sagittarius A* in great detail. We put SgrA*, as a nucleus with weak activity, into the context of nearby low luminosity nuclei. Possible hints for galaxy evolution of these sources across the [NII]-based diagnostic diagram can be inferred from dep… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

    Comments: 12 pages, 8 figures, to be published in the proceedings of the XII workshop on 'Multifrequency Behavior of High Energy Cosmic Sources', held on 12-17 June, 2017, in Palermo, Italy