Search | arXiv e-print repository

Generalized Deepfake Attribution

Authors: Sowdagar Mahammad Shahid, Sudev Kumar Padhi, Umesh Kashyap, Sk. Subidh Ali

Abstract: The landscape of fake media creation changed with the introduction of Generative Adversarial Networks (GAN s). Fake media creation has been on the rise with the rapid advances in generation technology, leading to new challenges in Detecting fake media. A fundamental characteristic of GAN s is their sensitivity to parameter initialization, known as seeds. Each distinct seed utilized during training… ▽ More The landscape of fake media creation changed with the introduction of Generative Adversarial Networks (GAN s). Fake media creation has been on the rise with the rapid advances in generation technology, leading to new challenges in Detecting fake media. A fundamental characteristic of GAN s is their sensitivity to parameter initialization, known as seeds. Each distinct seed utilized during training leads to the creation of unique model instances, resulting in divergent image outputs despite employing the same architecture. This means that even if we have one GAN architecture, it can produce countless variations of GAN models depending on the seed used. Existing methods for attributing deepfakes work well only if they have seen the specific GAN model during training. If the GAN architectures are retrained with a different seed, these methods struggle to attribute the fakes. This seed dependency issue made it difficult to attribute deepfakes with existing methods. We proposed a generalized deepfake attribution network (GDA-N et) to attribute fake images to their respective GAN architectures, even if they are generated from a retrained version of the GAN architecture with a different seed (cross-seed) or from the fine-tuned version of the existing GAN model. Extensive experiments on cross-seed and fine-tuned data of GAN models show that our method is highly effective compared to existing methods. We have provided the source code to validate our results. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2402.08964 [pdf, other]

Predicting User Experience on Laptops from Hardware Specifications

Authors: Saswat Padhi, Sunil K. Bhasin, Udaya K. Ammu, Alex Bergman, Allan Knies

Abstract: Estimating the overall user experience (UX) on a device is a common challenge faced by manufacturers. Today, device makers primarily rely on microbenchmark scores, such as Geekbench, that stress test specific hardware components, such as CPU or RAM, but do not satisfactorily capture consumer workloads. System designers often rely on domain-specific heuristics and extensive testing of prototypes to… ▽ More Estimating the overall user experience (UX) on a device is a common challenge faced by manufacturers. Today, device makers primarily rely on microbenchmark scores, such as Geekbench, that stress test specific hardware components, such as CPU or RAM, but do not satisfactorily capture consumer workloads. System designers often rely on domain-specific heuristics and extensive testing of prototypes to reach a desired UX goal, and yet there is often a mismatch between the manufacturers' performance claims and the consumers' experience. We present our initial results on predicting real-life experience on laptops from their hardware specifications. We target web applications that run on Chromebooks (ChromeOS laptops) for a simple and fair aggregation of experience across applications and workloads. On 54 laptops, we track 9 UX metrics on common end-user workloads: web browsing, video playback and audio/video calls. We focus on a subset of high-level metrics exposed by the Chrome browser, that are part of the Web Vitals initiative for judging the UX on web applications. With a dataset of 100K UX data points, we train gradient boosted regression trees that predict the metric values from device specifications. Across our 9 metrics, we note a mean $R^2$ score (goodness-of-fit on our dataset) of 97.8% and a mean MAAPE (percentage error in prediction on unseen data) of 10.1%. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Spotlight presentation at the ML for Systems workshop at NeurIPS 2023 ; 9 pages with appendix ; https://openreview.net/forum?id=mHShSE7MSU

arXiv:2312.06001 [pdf, other]

The SyGuS Language Standard Version 2.1

Authors: Saswat Padhi, Elizabeth Polgreen, Mukund Raghothaman, Andrew Reynolds, Abhishek Udupa

Abstract: The classical formulation of the program-synthesis problem is to find a program that meets a correctness specification given as a logical formula. Syntax-guided synthesis (SyGuS) is a standardized format for specifying the correctness specification with a syntactic template that constrains the space of allowed implementations. The input to SyGuS consists of a background theory, a semantic correc… ▽ More The classical formulation of the program-synthesis problem is to find a program that meets a correctness specification given as a logical formula. Syntax-guided synthesis (SyGuS) is a standardized format for specifying the correctness specification with a syntactic template that constrains the space of allowed implementations. The input to SyGuS consists of a background theory, a semantic correctness specification for the desired program given by a logical formula, and a syntactic set of candidate implementations given by a grammar. The computational problem then is to find an implementation from the set of candidate expressions that satisfies the specification in the given theory. The formulation of the problem builds on SMT-LIB. This document defines the SyGuS 2.1 standard, which is intended to be used as the standard input and output language for solvers targeting the syntax-guided synthesis problem. It borrows many concepts and language constructs from the standard format for Satisfiability Modulo Theories (SMT) solvers, the SMT-LIB 2.6 standard. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 36 pages, introduced in the SYNT workshop at CAV 2021

arXiv:2311.07888 [pdf, other]

RoboSense At Edge: Detecting Slip, Crumple and Shape of the Object in Robotic Hand for Teleoprations

Authors: Sudev Kumar Padhi, Mohit Kumar, Debanka Giri, Subidh Ali

Abstract: Slip and crumple detection is essential for performing robust manipulation tasks with a robotic hand (RH) like remote surgery. It has been one of the challenging problems in the robotics manipulation community. In this work, we propose a technique based on machine learning (ML) based techniques to detect the slip, and crumple as well as the shape of an object that is currently held in the robotic… ▽ More Slip and crumple detection is essential for performing robust manipulation tasks with a robotic hand (RH) like remote surgery. It has been one of the challenging problems in the robotics manipulation community. In this work, we propose a technique based on machine learning (ML) based techniques to detect the slip, and crumple as well as the shape of an object that is currently held in the robotic hand. We proposed ML model will detect the slip, crumple, and shape using the force/torque exerted and the angular positions of the actuators present in the RH. The proposed model would be integrated into the loop of a robotic hand(RH) and haptic glove(HG). This would help us to reduce the latency in case of teleoperation △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2303.00830 [pdf, other]

DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments

Authors: Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy

Abstract: In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed s… ▽ More In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multi-speaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions. △ Less

Submitted 5 June, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2003.12106 [pdf, other]

Data-Driven Inference of Representation Invariants

Authors: Anders Miltner, Saswat Padhi, Todd Millstein, David Walker

Abstract: A representation invariant is a property that holds of all values of abstract type produced by a module. Representation invariants play important roles in software engineering and program verification. In this paper, we develop a counterexample-driven algorithm for inferring a representation invariant that is sufficient to imply a desired specification for a module. The key novelty is a type-direc… ▽ More A representation invariant is a property that holds of all values of abstract type produced by a module. Representation invariants play important roles in software engineering and program verification. In this paper, we develop a counterexample-driven algorithm for inferring a representation invariant that is sufficient to imply a desired specification for a module. The key novelty is a type-directed notion of visible inductiveness, which ensures that the algorithm makes progress toward its goal as it alternates between weakening and strengthening candidate invariants. The algorithm is parameterized by an example-based synthesis engine and a verifier, and we prove that it is sound and complete for first-order modules over finite types, assuming that the synthesizer and verifier are as well. We implement these ideas in a tool called Hanoi, which synthesizes representation invariants for recursive data types. Hanoi not only handles invariants for first-order code, but higher-order code as well. In its back end, Hanoi uses an enumerative synthesizer called Myth and an enumerative testing tool as a verifier. Because Hanoi uses testing for verification, it is not sound, though our empirical evaluation shows that it is successful on the benchmarks we investigated. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Comments: 18 Pages, Full version of PLDI 2020 paper

arXiv:1911.11728 [pdf, other]

On Scaling Data-Driven Loop Invariant Inference

Authors: Sahil Bhatia, Saswat Padhi, Nagarajan Natarajan, Rahul Sharma, Prateek Jain

Abstract: Automated synthesis of inductive invariants is an important problem in software verification. Once all the invariants have been specified, software verification reduces to checking of verification conditions. Although static analyses to infer invariants have been studied for over forty years, recent years have seen a flurry of data-driven invariant inference techniques which guess invariants from… ▽ More Automated synthesis of inductive invariants is an important problem in software verification. Once all the invariants have been specified, software verification reduces to checking of verification conditions. Although static analyses to infer invariants have been studied for over forty years, recent years have seen a flurry of data-driven invariant inference techniques which guess invariants from examples instead of analyzing program text. However, these techniques have been demonstrated to scale only to programs with a small number of variables. In this paper, we study these scalability issues and address them in our tool oasis that improves the scale of data-driven invariant inference and outperforms state-of-the-art systems on benchmarks from the invariant inference track of the Syntax Guided Synthesis competition. △ Less

Submitted 16 July, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

arXiv:1905.07457 [pdf, other]

Overfitting in Synthesis: Theory and Practice (Extended Version)

Authors: Saswat Padhi, Todd Millstein, Aditya Nori, Rahul Sharma

Abstract: In syntax-guided synthesis (SyGuS), a synthesizer's goal is to automatically generate a program belonging to a grammar of possible implementations that meets a logical specification. We investigate a common limitation across state-of-the-art SyGuS tools that perform counterexample-guided inductive synthesis (CEGIS). We empirically observe that as the expressiveness of the provided grammar increase… ▽ More In syntax-guided synthesis (SyGuS), a synthesizer's goal is to automatically generate a program belonging to a grammar of possible implementations that meets a logical specification. We investigate a common limitation across state-of-the-art SyGuS tools that perform counterexample-guided inductive synthesis (CEGIS). We empirically observe that as the expressiveness of the provided grammar increases, the performance of these tools degrades significantly. We claim that this degradation is not only due to a larger search space, but also due to overfitting. We formally define this phenomenon and prove no-free-lunch theorems for SyGuS, which reveal a fundamental tradeoff between synthesizer performance and grammar expressiveness. A standard approach to mitigate overfitting in machine learning is to run multiple learners with varying expressiveness in parallel. We demonstrate that this insight can immediately benefit existing SyGuS tools. We also propose a novel single-threaded technique called hybrid enumeration that interleaves different grammars and outperforms the winner of the 2018 SyGuS competition (Inv track), solving more problems and achieving a $5\times$ mean speedup. △ Less

Submitted 7 June, 2019; v1 submitted 17 May, 2019; originally announced May 2019.

Comments: 24 pages (5 pages of appendices), 7 figures, includes proofs of theorems

arXiv:1904.07146 [pdf, other]

SyGuS-Comp 2018: Results and Analysis

Authors: Rajeev Alur, Dana Fisman, Saswat Padhi, Rishabh Singh, Abhishek Udupa

Abstract: Syntax-guided synthesis (SyGuS) is the computational problem of finding an implementation $f$ that meets both a semantic constraint given by a logical formula $φ$ in a background theory $\mathbb{T}$, and a syntactic constraint given by a grammar $G$, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in the SyGuS input format (SyGuS-IF),… ▽ More Syntax-guided synthesis (SyGuS) is the computational problem of finding an implementation $f$ that meets both a semantic constraint given by a logical formula $φ$ in a background theory $\mathbb{T}$, and a syntactic constraint given by a grammar $G$, which specifies the allowed set of candidate implementations. Such a synthesis problem can be formally defined in the SyGuS input format (SyGuS-IF), a language that is built on top of SMT-LIB. The Syntax-Guided Synthesis competition (SyGuS-Comp) is an effort to facilitate, bring together and accelerate research and development of efficient solvers for SyGuS by providing a platform for evaluating different synthesis techniques on a comprehensive set of benchmarks. In the 5th SyGuS-Comp, five solvers competed on over 1600 benchmarks across various tracks. This paper presents and analyses the results of this year's (2018) SyGuS competition. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: 18 pages. Satellite event of CAV'18 and SYNT'18. arXiv admin note: substantial text overlap with arXiv:1711.11438

arXiv:1709.05725 [pdf, other]

doi 10.1145/3276520

FlashProfile: A Framework for Synthesizing Data Profiles

Authors: Saswat Padhi, Prateek Jain, Daniel Perelman, Oleksandr Polozov, Sumit Gulwani, Todd Millstein

Abstract: We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, m… ▽ More We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across $153$ tasks over $75$ large real datasets, we observe a median profiling time of only $\sim\,0.7\,$s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill. △ Less

Submitted 16 April, 2019; v1 submitted 17 September, 2017; originally announced September 2017.

Comments: 28 pages, SPLASH (OOPSLA) 2018

Journal ref: Proc. ACM Program. Lang. 2, OOPSLA, Article 150 (November 2018) 150:1-150:28

arXiv:1707.02029 [pdf, other]

LoopInvGen: A Loop Invariant Generator based on Precondition Inference

Authors: Saswat Padhi, Rahul Sharma, Todd Millstein

Abstract: We describe the LoopInvGen tool for generating loop invariants that can provably guarantee correctness of a program with respect to a given specification. LoopInvGen is an efficient implementation of the inference technique originally proposed in our earlier work on PIE (https://doi.org/10.1145/2908080.2908099). In contrast to existing techniques, LoopInvGen is not restricted to a fixed set of f… ▽ More We describe the LoopInvGen tool for generating loop invariants that can provably guarantee correctness of a program with respect to a given specification. LoopInvGen is an efficient implementation of the inference technique originally proposed in our earlier work on PIE (https://doi.org/10.1145/2908080.2908099). In contrast to existing techniques, LoopInvGen is not restricted to a fixed set of features -- atomic predicates that are composed together to build complex loop invariants. Instead, we start with no initial features, and use program synthesis techniques to grow the set on demand. This not only enables a less onerous and more expressive approach, but also appears to be significantly faster than the existing tools over the SyGuS-COMP 2018 benchmarks from the INV track. △ Less

Submitted 31 October, 2019; v1 submitted 6 July, 2017; originally announced July 2017.

Comments: Tool Description ( for technical details, see our PLDI paper at https://doi.org/10.1145/2908080.2908099 ), SyGuS-COMP'19 Competition Contribution, 4 pages

arXiv:1310.1190 [pdf]

Review on Fragment Allocation by using Clustering Technique in Distributed Database System

Authors: Priyanka Dash, Ranjita Rout, Satya Bhusan Pratihari, Sanjay Kumar Padhi

Abstract: Considerable Progress has been made in the last few years in improving the performance of the distributed database systems. The development of Fragment allocation models in Distributed database is becoming difficult due to the complexity of huge number of sites and their communication considerations. Under such conditions, simulation of clustering and data allocation is adequate tools for understa… ▽ More Considerable Progress has been made in the last few years in improving the performance of the distributed database systems. The development of Fragment allocation models in Distributed database is becoming difficult due to the complexity of huge number of sites and their communication considerations. Under such conditions, simulation of clustering and data allocation is adequate tools for understanding and evaluating the performance of data allocation in Distributed databases. Clustering sites and fragment allocation are key challenges in Distributed database performance, and are considered to be efficient methods that have a major role in reducing transferred and accessed data during the execution of applications. In this paper a review on Fragment allocation by using Clustering technique is given in Distributed Database System. △ Less

Submitted 4 October, 2013; originally announced October 2013.

Comments: 9 pages,3 figures

Journal ref: IJCSN,October,2013,Volume-2 Issue-5

arXiv:1308.0843 [pdf, other]

Snowmass Energy Frontier Simulations using the Open Science Grid (A Snowmass 2013 whitepaper)

Authors: A. Avetisyan, S. Bhattacharya, M. Narain, S. Padhi, J. Hirschauer, T. Levshina, P. McBride, C. Sehgal, M. Slyz, M. Rynge, S. Malik, J. Stupak III

Abstract: Snowmass is a US long-term planning study for the high-energy community by the American Physical Society's Division of Particles and Fields. For its simulation studies, opportunistic resources are harnessed using the Open Science Grid infrastructure. Late binding grid technology, GlideinWMS, was used for distributed scheduling of the simulation jobs across many sites mainly in the US. The pilot in… ▽ More Snowmass is a US long-term planning study for the high-energy community by the American Physical Society's Division of Particles and Fields. For its simulation studies, opportunistic resources are harnessed using the Open Science Grid infrastructure. Late binding grid technology, GlideinWMS, was used for distributed scheduling of the simulation jobs across many sites mainly in the US. The pilot infrastructure also uses the Parrot mechanism to dynamically access CvmFS in order to ascertain a homogeneous environment across the nodes. This report presents the resource usage and the storage model used for simulating large statistics Standard Model backgrounds needed for Snowmass Energy Frontier studies. △ Less

Submitted 1 October, 2013; v1 submitted 4 August, 2013; originally announced August 2013.

Report number: SNOW13-00168

Showing 1–13 of 13 results for author: Padhi, S