Search | arXiv e-print repository

arXiv:2407.00931 [pdf, other]

Real-Time Neuromorphic Navigation: Integrating Event-Based Vision and Physics-Driven Planning on a Parrot Bebop2 Quadrotor

Authors: Amogh Joshi, Sourav Sanyal, Kaushik Roy

Abstract: In autonomous aerial navigation, real-time and energy-efficient obstacle avoidance remains a significant challenge, especially in dynamic and complex indoor environments. This work presents a novel integration of neuromorphic event cameras with physics-driven planning algorithms implemented on a Parrot Bebop2 quadrotor. Neuromorphic event cameras, characterized by their high dynamic range and low… ▽ More In autonomous aerial navigation, real-time and energy-efficient obstacle avoidance remains a significant challenge, especially in dynamic and complex indoor environments. This work presents a novel integration of neuromorphic event cameras with physics-driven planning algorithms implemented on a Parrot Bebop2 quadrotor. Neuromorphic event cameras, characterized by their high dynamic range and low latency, offer significant advantages over traditional frame-based systems, particularly in poor lighting conditions or during high-speed maneuvers. We use a DVS camera with a shallow Spiking Neural Network (SNN) for event-based object detection of a moving ring in real-time in an indoor lab. Further, we enhance drone control with physics-guided empirical knowledge inside a neural network training mechanism, to predict energy-efficient flight paths to fly through the moving ring. This integration results in a real-time, low-latency navigation system capable of dynamically responding to environmental changes while minimizing energy consumption. We detail our hardware setup, control loop, and modifications necessary for real-world applications, including the challenges of sensor integration without burdening the flight capabilities. Experimental results demonstrate the effectiveness of our approach in achieving robust, collision-free, and energy-efficient flight paths, showcasing the potential of neuromorphic vision and physics-driven planning in enhancing autonomous navigation systems. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.18700 [pdf, other]

On Fourier analysis of sparse Boolean functions over certain Abelian groups

Authors: Sourav Chakraborty, Swarnalipa Datta, Pranjal Dutta, Arijit Ghosh, Swagato Sanyal

Abstract: Given an Abelian group G, a Boolean-valued function f: G -> {-1,+1}, is said to be s-sparse, if it has at most s-many non-zero Fourier coefficients over the domain G. In a seminal paper, Gopalan et al. proved "Granularity" for Fourier coefficients of Boolean valued functions over Z_2^n, that have found many diverse applications in theoretical computer science and combinatorics. They also studied s… ▽ More Given an Abelian group G, a Boolean-valued function f: G -> {-1,+1}, is said to be s-sparse, if it has at most s-many non-zero Fourier coefficients over the domain G. In a seminal paper, Gopalan et al. proved "Granularity" for Fourier coefficients of Boolean valued functions over Z_2^n, that have found many diverse applications in theoretical computer science and combinatorics. They also studied structural results for Boolean functions over Z_2^n which are approximately Fourier-sparse. In this work, we obtain structural results for approximately Fourier-sparse Boolean valued functions over Abelian groups G of the form,G:= Z_{p_1}^{n_1} \times ... \times Z_{p_t}^{n_t}, for distinct primes p_i. We also obtain a lower bound of the form 1/(m^{2}s)^ceiling(phi(m)/2), on the absolute value of the smallest non-zero Fourier coefficient of an s-sparse function, where m=p_1 ... p_t, and phi(m)=(p_1-1) ... (p_t-1). We carefully apply probabilistic techniques from Gopalan et al., to obtain our structural results, and use some non-trivial results from algebraic number theory to get the lower bound. We construct a family of at most s-sparse Boolean functions over Z_p^n, where p > 2, for arbitrarily large enough s, where the minimum non-zero Fourier coefficient is 1/omega(n). The "Granularity" result of Gopalan et al. implies that the absolute values of non-zero Fourier coefficients of any s-sparse Boolean valued function over Z_2^n are 1/O(s). So, our result shows that one cannot expect such a lower bound for general Abelian groups. Using our new structural results on the Fourier coefficients of sparse functions, we design an efficient testing algorithm for Fourier-sparse Boolean functions, thata requires poly((ms)^phi(m),1/epsilon)-many queries. Further, we prove an Omega(sqrt{s}) lower bound on the query complexity of any adaptive sparsity testing algorithm. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.11794 [pdf, other]

DataComp-LM: In search of the next generation of training sets for language models

Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Project page: https://www.datacomp.ai/dclm/

arXiv:2406.07591 [pdf, other]

Holographic description of unified early and late universe in viscous mimetic gravity

Authors: G. S Khadekar Saibal Ray Aritra Sanyal

Abstract: In this study, we explore the mimetic matter model proposed by Chamseddine and Mukhanov (J. High Energy Phys. 11, 135, 2013), utilizing the holographic principle to coherently describe both the early and late universe when bulk viscosity is present in the inhomogeneous equation of state. Our examination of the universe's evolution is based on the generalized infrared-cutoff holographic dark energy… ▽ More In this study, we explore the mimetic matter model proposed by Chamseddine and Mukhanov (J. High Energy Phys. 11, 135, 2013), utilizing the holographic principle to coherently describe both the early and late universe when bulk viscosity is present in the inhomogeneous equation of state. Our examination of the universe's evolution is based on the generalized infrared-cutoff holographic dark energy model detailed by Nojiri and Odintsov (Eur. Phys. J. C 77, 528, 2017) within the context of the flat FRW model. From a holographic perspective, we derive the energy conservation equation incorporating mimetic matter through a viscous holographic fluid model. Furthermore, we analyze various scenarios of bulk viscosity by assuming a constant equation of state parameter and derive the infrared cut-off expression in terms of the particle horizon. We demonstrate that within the framework of mimetic gravity, there is a class of solutions comparable to those in General Relativity, with an additional contribution from a non-relativistic mimetic matter component. These solutions can effectively describe dark matte △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.05568 [pdf, other]

Geodesic motion of particles in the vicinity of the $κ$-deformed Schwarzchild Black Hole

Authors: Dilip Kumar, Suman Kumar Panja, Abhisek Saha, Soma Sanyal

Abstract: In this study, we investigate the geodesic motion of a test particle around the Schwarzchild black hole in a $κ$-deformed space-time. We compute a modified Lagrangian to obtain the $κ$-deformed effective potential and find the particle trajectories based on the constants of motion. For the same value of angular momentum, we obtain a significant deformation in the orbits of the particles due to the… ▽ More In this study, we investigate the geodesic motion of a test particle around the Schwarzchild black hole in a $κ$-deformed space-time. We compute a modified Lagrangian to obtain the $κ$-deformed effective potential and find the particle trajectories based on the constants of motion. For the same value of angular momentum, we obtain a significant deformation in the orbits of the particles due to the non-commutativity of the $κ$-deformed space-time. The deformation parameter becomes more significant for higher values of the angular momentum. The radius of the individual trajectories become smaller and their velocities decrease compared to the commutative case. The radius of the innermost stable circular orbit ($r_{ISCO}$) is also found using the modified effective potential. Though the equations get modified due to the non-commutativity of the $κ$-deformed space-time, the $r_{ISCO}$ remains the same. We then study a large number of freely streaming particles moving in this $κ$-deformed space-time and analyze the movement of these particles around the black hole due to the non-commutativity of the space-time. We concentrate on particles with different angular momentum moving around the black hole. We find that the motion of the particles are modified due to the non-commutativity of the space-time. The particles move slower along their respective trajectories in the deformed space-time. So, they remain closer to the black hole for a longer period of time, indicating that the accretion of freely streaming particles around the black hole would be modified by the non-commutativity of the space-time. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 16 pages, 8 figures

arXiv:2404.08634 [pdf, other]

Pre-training Small Base LMs with Fewer Tokens

Authors: Sunny Sanyal, Sujay Sanghavi, Alexandros G. Dimakis

Abstract: We study the effectiveness of a simple approach to develop a small base language model (LM) starting from an existing large base LM: first inherit a few transformer blocks from the larger LM, and then train this smaller model on a very small subset (0.1\%) of the raw pretraining data of the larger model. We call our simple recipe Inheritune and first demonstrate it for building a small base LM wit… ▽ More We study the effectiveness of a simple approach to develop a small base language model (LM) starting from an existing large base LM: first inherit a few transformer blocks from the larger LM, and then train this smaller model on a very small subset (0.1\%) of the raw pretraining data of the larger model. We call our simple recipe Inheritune and first demonstrate it for building a small base LM with 1.5B parameters using 1B tokens (and a starting few layers of larger LM of 3B parameters); we do this using a single A6000 GPU for less than half a day. Across 9 diverse evaluation datasets as well as the MMLU benchmark, the resulting model compares favorably to publicly available base models of 1B-2B size, some of which have been trained using 50-1000 times more tokens. We investigate Inheritune in a slightly different setting where we train small LMs utilizing larger LMs and their full pre-training dataset. Here we show that smaller LMs trained utilizing some of the layers of GPT2-medium (355M) and GPT-2-large (770M) can effectively match the val loss of their bigger counterparts when trained from scratch for the same number of training steps on OpenWebText dataset with 9B tokens. We analyze our recipe with extensive experiments and demonstrate it efficacy on diverse settings. Our code is available at https://github.com/sanyalsunny111/LLM-Inheritune. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 15 pages, 6 figures, 10 tables

arXiv:2403.04379 [pdf, other]

Performance evaluation of conditional handover in 5G systems under fading scenario

Authors: Souvik Deb, Megh Rathod, Rishi Balamurugan, Shankar K. Ghosh, Rajeev K. Singh, Samriddha Sanyal

Abstract: To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution. Unlike A3 based handover where handover execution is certain after receiving handover command from the serving access network, in CHO, handover execution is conditional on the RSRP measurements from both current and target access networks, as well as o… ▽ More To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution. Unlike A3 based handover where handover execution is certain after receiving handover command from the serving access network, in CHO, handover execution is conditional on the RSRP measurements from both current and target access networks, as well as on mobility parameters such as preparation and execution offsets. Analytic evaluation of conditional handover performance is unprecedented in literature. In this work, handover performance of CHO has been carried out in terms of handover latency, handover packet loss and handover failure probability. A Markov model accounting the effect of different mobility parameters (e.g., execution offset, preparation offset, time-to-preparation and time-to-execution), UE velocity and channel fading characteristics; has been proposed to characterize handover failure. Results obtained from the analytic model has been validated against extensive simulation results. Our study reveal that optimal configuration of $O_{exec}$, $O_{prep}$, $T_{exec}$ and $T_{prep}$ is actually conditional on underlying UE velocity and fading characteristics. This study will be helpful for the mobile operators to choose appropriate thresholds of the mobility parameters under different channel condition and UE velocities. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02852 [pdf, ps, other]

Asymptotically safe cosmology with non-canonical scalar field

Authors: Rituparna Mandal, Soma Sanyal

Abstract: We investigate the quantum modified cosmological dynamical equations in a Friedmann-Robertson-Walker universe filled with a barotropic fluid and a general non-canonical scalar field characterized by a Lagrangian similar to k-essence model but with a potential term. Quantum corrections are incorporated by considering the running of gravitational and potential couplings, employing the functional ren… ▽ More We investigate the quantum modified cosmological dynamical equations in a Friedmann-Robertson-Walker universe filled with a barotropic fluid and a general non-canonical scalar field characterized by a Lagrangian similar to k-essence model but with a potential term. Quantum corrections are incorporated by considering the running of gravitational and potential couplings, employing the functional renormalization group approach. Covariant conservation of the non-canonical scalar field and the background barotropic fluid is considered separately, imposing a constraint resulting from the Bianchi identity. This constraint determines the evolution of the cut-off scale with the scale factor and also reveals cosmic fixed points, depending on whether the flow ceases or continues to evolve. We explore how the general non-canonical scalar field parameter affects the different types of cosmic fixed points and how it differs from the canonical case. Furthermore, we establish a bound on the ratio of the RG parameters involving the non-canonical parameter for which the universe may exhibit accelerated expansion for mixed fixed points. This bound indicates the non-canonical scalar field includes larger sets of RG fixed point which may give rise to an accelerated universe. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 23 pages, 1 table

arXiv:2402.14751 [pdf, ps, other]

On the communication complexity of finding a king in a tournament

Authors: Nikhil S. Mande, Manaswi Paraashar, Swagato Sanyal, Nitin Saurabh

Abstract: A tournament is a complete directed graph. A king in a tournament is a vertex v such that every other vertex is reachable from v via a path of length at most 2. It is well known that every tournament has at least one king, one of which is a maximum out-degree vertex. The tasks of finding a king, a maximum out-degree vertex and a source in a tournament has been relatively well studied in the contex… ▽ More A tournament is a complete directed graph. A king in a tournament is a vertex v such that every other vertex is reachable from v via a path of length at most 2. It is well known that every tournament has at least one king, one of which is a maximum out-degree vertex. The tasks of finding a king, a maximum out-degree vertex and a source in a tournament has been relatively well studied in the context of query complexity. We study the communication complexity of these tasks, where the edges are partitioned between two players. The following are our main results for n-vertex tournaments: 1) The deterministic communication complexity of finding whether a source exists is tilde{Theta}(log^2 n). 2) The deterministic and randomized communication complexities of finding a king are Theta(n). The quantum communication complexity is tilde{Theta}(sqrt{n}). 3) The deterministic, randomized and quantum communication complexities of finding a maximum out-degree vertex are Theta(n log n), tilde{Theta}(n) and tilde{Theta}(sqrt{n}), respectively. Our upper bounds hold for all partitions of edges, and the lower bounds for a specific partition of the edges. To show the first bullet above, we show, perhaps surprisingly, that finding a source in a tournament is equivalent to the well-studied Clique vs. Independent Set (CIS) problem on undirected graphs. Our bounds for finding a source then follow from known bounds on the complexity of the CIS problem. In view of this equivalence, we can view the task of finding a king in a tournament to be a natural generalization of CIS. One of our lower bounds uses a fooling-set based argument, and all our other lower bounds follow from carefully-constructed reductions from Set-Disjointness. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.03686 [pdf, other]

Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification

Authors: Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren

Abstract: Making inferences in text comprehension to understand the meaning is essential in language processing. This work studies the entailment verification (EV) problem of multi-sentence premises that requires a system to make multiple inferences implicitly. Studying EV for such complex premises is important because modern NLP problems, such as detecting inconsistent model-generated rationales, require c… ▽ More Making inferences in text comprehension to understand the meaning is essential in language processing. This work studies the entailment verification (EV) problem of multi-sentence premises that requires a system to make multiple inferences implicitly. Studying EV for such complex premises is important because modern NLP problems, such as detecting inconsistent model-generated rationales, require complex multi-hop reasoning. However, current textual inference datasets mostly contain short premises that only partially focus on these challenges. To address this, we compile an EV benchmark that includes datasets from three NLP domains (NLI, contextual QA, and rationales) containing multi-sentence premises. On benchmarking humans and LLMs, we find that LLMs are better than humans in multi-hop reasoning across extended contexts, while humans perform better in simple deductive reasoning tasks. We also finetune a Flan-T5 model for EV using two training objectives to obtain a strong open-source model that outperforms GPT-3.5 and rivals GPT-4. Finally, we use this model to filter out inconsistent model-generated rationales in self-consistency decoding, resulting in a 6% accuracy improvement on average across three MCQ datasets. △ Less

Submitted 27 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.15352 [pdf, ps, other]

Randomized query composition and product distributions

Authors: Swagato Sanyal

Abstract: Let R_eps denote randomized query complexity for error probability eps, and R:=R_{1/3}. In this work we investigate whether a perfect composition theorem R(f o g^n)=Omega(R(f).R(g)) holds for a relation f in {0,1}^n * S and a total inner function g:{0,1}^m \to {0, 1}. Let D^(prod) denote the maximum distributional query complexity with respect to any product (over variables) distribution. In thi… ▽ More Let R_eps denote randomized query complexity for error probability eps, and R:=R_{1/3}. In this work we investigate whether a perfect composition theorem R(f o g^n)=Omega(R(f).R(g)) holds for a relation f in {0,1}^n * S and a total inner function g:{0,1}^m \to {0, 1}. Let D^(prod) denote the maximum distributional query complexity with respect to any product (over variables) distribution. In this work we show the composition theorem R(f o g^n)=Omega(R(f).D^{prod}(g)) up to logarithmic factors. In light of the minimax theorem which states that R(g) is the maximum distributional complexity of g over any distribution, our result makes progress towards answering the composition question. We prove our result by means of a complexity measure R^(prod)_(eps) that we define for total Boolean functions. We show it to be equivalent (up to logarithmic factors) to the sabotage complexity measure RS() defined by Ben-David and Kothari (ICALP 2019): RS(g) = Theta(R^(prod)_(1/3)(g)) (up to log factors). We ask if our bound RS(g) = Omega(D^(prod)(g)) (up to log factors) is tight. We answer this question in the negative, by showing that for the NAND tree function, sabotage complexity is polynomially larger than D^(prod). Our proof yields an alternative and different derivation of the tight lower bound on the bounded error randomized query complexity of the NAND tree function (originally proved by Santha in 1985), which may be of independent interest. Our result gives an explicit polynomial separation between R and D^(prod) which, to our knowledge, was not known prior to our work. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: Accepted to STACS 2024

arXiv:2401.06035 [pdf, other]

RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks

Authors: Partha Ghosh, Soubhik Sanyal, Cordelia Schmid, Bernhard Schölkopf

Abstract: We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Ind… ▽ More We present a novel unconditional video generative model designed to address long-term spatial and temporal dependencies. To capture these dependencies, our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks developed for three-dimensional object representation and employs a singular latent code to model an entire video sequence. Individual video frames are then synthesized from an intermediate tri-plane representation, which itself is derived from the primary latent code. This novel strategy reduces computational complexity by a factor of $2$ as measured in FLOPs. Consequently, our approach facilitates the efficient and temporally coherent generation of videos. Moreover, our joint frame modeling approach, in contrast to autoregressive methods, mitigates the generation of visual artifacts. We further enhance the model's capabilities by integrating an optical flow-based module within our Generative Adversarial Network (GAN) based generator architecture, thereby compensating for the constraints imposed by a smaller generator size. As a result, our model is capable of synthesizing high-fidelity video clips at a resolution of $256\times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps. The efficacy and versatility of our approach are empirically validated through qualitative and quantitative assessments across three different datasets comprising both synthetic and real video clips. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2312.14579 [pdf, other]

Environment-Specific People

Authors: Mirela Ostrek, Soubhik Sanyal, Carol O'Sullivan, Michael J. Black, Justus Thies

Abstract: Despite significant progress in generative image synthesis and full-body generation in particular, state-of-the-art methods are either context-independent, overly reliant to text prompts, or bound to the curated training datasets, such as fashion images with monotonous backgrounds. Here, our goal is to generate people in clothing that is semantically appropriate for a given scene. To this end, we… ▽ More Despite significant progress in generative image synthesis and full-body generation in particular, state-of-the-art methods are either context-independent, overly reliant to text prompts, or bound to the curated training datasets, such as fashion images with monotonous backgrounds. Here, our goal is to generate people in clothing that is semantically appropriate for a given scene. To this end, we present ESP, a novel method for context-aware full-body generation, that enables photo-realistic inpainting of people into existing "in-the-wild" photographs. ESP is conditioned on a 2D pose and contextual cues that are extracted from the environment photograph and integrated into the generation process. Our models are trained on a dataset containing a set of in-the-wild photographs of people covering a wide range of different environments. The method is analyzed quantitatively and qualitatively, and we show that ESP outperforms state-of-the-art on the task of contextual full-body generation. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.03106 [pdf, other]

U(1) quantum spin liquids in dipolar-octupolar pyrochlore magnets: a fermionic parton approach

Authors: Krushna Chandra Sahu, Sambuddha Sanyal

Abstract: We study the uniform $U(1)$ quantum spin liquid (QSL) with low-energy fermionic quasiparticles for pyrochlore magnets with dipolar-octupolar symmetry, employing a fermionic parton mean field theory approach. Self-consistent calculations stabilize 12 fully symmetric uniform $U(1)$ QSLs; of which four mean-field states are "monopole-flux" states. Several of these mean-field states show a linear temp… ▽ More We study the uniform $U(1)$ quantum spin liquid (QSL) with low-energy fermionic quasiparticles for pyrochlore magnets with dipolar-octupolar symmetry, employing a fermionic parton mean field theory approach. Self-consistent calculations stabilize 12 fully symmetric uniform $U(1)$ QSLs; of which four mean-field states are "monopole-flux" states. Several of these mean-field states show a linear temperature dependence of specific heat at low temperatures; the other phases show a power law temperature dependence of specific heat $C \sim T^α$, where $α$ is close to 1. We further compute the dynamic spin structure factors and discuss the possible signature of these fermionic spinons in neutron-scattering experiments on DO magnetic systems. Our results provide a possible way to understand the metallic specific heat response in $Nd_2 Sc Nb O_7$. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 10+6 pages, 4+2 figures, 4+2 tables

arXiv:2311.16294 [pdf, other]

Aligning Non-Causal Factors for Transformer-Based Source-Free Domain Adaptation

Authors: Sunandini Sanyal, Ashish Ramayee Asokan, Suvaansh Bhambri, Pradyumna YM, Akshay Kulkarni, Jogendra Nath Kundu, R Venkatesh Babu

Abstract: Conventional domain adaptation algorithms aim to achieve better generalization by aligning only the task-discriminative causal factors between a source and target domain. However, we find that retaining the spurious correlation between causal and non-causal factors plays a vital role in bridging the domain gap and improving target adaptation. Therefore, we propose to build a framework that disenta… ▽ More Conventional domain adaptation algorithms aim to achieve better generalization by aligning only the task-discriminative causal factors between a source and target domain. However, we find that retaining the spurious correlation between causal and non-causal factors plays a vital role in bridging the domain gap and improving target adaptation. Therefore, we propose to build a framework that disentangles and supports causal factor alignment by aligning the non-causal factors first. We also investigate and find that the strong shape bias of vision transformers, coupled with its multi-head attention, make it a suitable architecture for realizing our proposed disentanglement. Hence, we propose to build a Causality-enforcing Source-Free Transformer framework (C-SFTrans) to achieve disentanglement via a novel two-stage alignment approach: a) non-causal factor alignment: non-causal factors are aligned using a style classification task which leads to an overall global alignment, b) task-discriminative causal factor alignment: causal factors are aligned via target adaptation. We are the first to investigate the role of vision transformers (ViTs) in a privacy-preserving source-free setting. Our approach achieves state-of-the-art results in several DA benchmarks. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: WACV 2024. Project Page: https://val.cds.iisc.ac.in/C-SFTrans/

arXiv:2311.13159 [pdf, other]

Multi-Objective Optimization via Wasserstein-Fisher-Rao Gradient Flow

Authors: Yinuo Ren, Tesi Xiao, Tanmay Gangwani, Anshuka Rangi, Holakou Rahmanian, Lexing Ying, Subhajit Sanyal

Abstract: Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to pr… ▽ More Multi-objective optimization (MOO) aims to optimize multiple, possibly conflicting objectives with widespread applications. We introduce a novel interacting particle method for MOO inspired by molecular dynamics simulations. Our approach combines overdamped Langevin and birth-death dynamics, incorporating a "dominance potential" to steer particles toward global Pareto optimality. In contrast to previous methods, our method is able to relocate dominated particles, making it particularly adept at managing Pareto fronts of complicated geometries. Our method is also theoretically grounded as a Wasserstein-Fisher-Rao gradient flow with convergence guarantees. Extensive experiments confirm that our approach outperforms state-of-the-art methods on challenging synthetic and real-world datasets. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.09603 [pdf, other]

Self-Contradictory Reasoning Evaluation and Detection

Authors: Ziyi Liu, Isabelle Lee, Yongkang Du, Soumya Sanyal, Jieyu Zhao

Abstract: In a plethora of recent work, large language models (LLMs) demonstrated impressive reasoning ability, but many proposed downstream reasoning tasks focus on performance-wise evaluation. Two fundamental questions persist: 1) how reliable is the quality of reasoning, and 2) can models detect unreliable reasoning? In this paper, we investigate self-contradictory (Self-Contra) reasoning, where the mode… ▽ More In a plethora of recent work, large language models (LLMs) demonstrated impressive reasoning ability, but many proposed downstream reasoning tasks focus on performance-wise evaluation. Two fundamental questions persist: 1) how reliable is the quality of reasoning, and 2) can models detect unreliable reasoning? In this paper, we investigate self-contradictory (Self-Contra) reasoning, where the model reasoning does not support predictions. To address 1), we assess the Self-Contra rate across four datasets and delve into finer-grained categories of Self-Contra reasoning. We find that LLMs often contradict themselves when performing reasoning tasks that involve contextual information understanding or commonsense. Importantly, a higher accuracy does not necessarily correspond to a lower Self-Contra rate. The model may appear to generate correct answers but it may take shortcuts in reasoning or skip over contextual evidence, thereby displaying Self-Contra behaviors with compromised reasoning. As for 2), we task GPT-4 with identifying Self-Contra reasoning and finer-grained fallacies. We observe that GPT-4 struggles to effectively detect Self-Contra reasoning, with significantly low performance compared with human judgment. Our results indicate that the current LLMs lack robustness necessary for reliable reasoning and we emphasize the urgent need for establishing best practices in comprehensive reasoning evaluations beyond accuracy-based metrics. △ Less

Submitted 19 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2309.12643 [pdf, other]

Synchrotron radiation from cosmic string wakes

Authors: Dilip Kumar, Soumen Nayak, Soma Sanyal

Abstract: Magnetic fields can be generated in cosmic string wakes due to the Biermann mechanism in the presence of neutrino inhomogeneities. As the cosmic string moves through the plasma the small magnetic field is amplified by the turbulence in the plasma. Relativistic charged particles which cross the magnetized wake of a cosmic string will therefore emit synchrotron radiation. The opening angle of the co… ▽ More Magnetic fields can be generated in cosmic string wakes due to the Biermann mechanism in the presence of neutrino inhomogeneities. As the cosmic string moves through the plasma the small magnetic field is amplified by the turbulence in the plasma. Relativistic charged particles which cross the magnetized wake of a cosmic string will therefore emit synchrotron radiation. The opening angle of the cosmic string is very small and so the wake appears like a relativistic jet. Assuming a homogeneous magnetic field in the wake of the string, we obtain the synchrotron emission from non thermal relativistic electrons in the wake of the string. The emitted radiation has a broad peak and is over a wide range of frequency. We show that the spectrum can be mapped to some of the unknown sources in different ranges of the current available catalogues. △ Less

Submitted 31 January, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 17 pages, 2 figures A new section has been added and graphs are changed

arXiv:2308.14023 [pdf, other]

Domain-Specificity Inducing Transformers for Source-Free Domain Adaptation

Authors: Sunandini Sanyal, Ashish Ramayee Asokan, Suvaansh Bhambri, Akshay Kulkarni, Jogendra Nath Kundu, R. Venkatesh Babu

Abstract: Conventional Domain Adaptation (DA) methods aim to learn domain-invariant feature representations to improve the target adaptation performance. However, we motivate that domain-specificity is equally important since in-domain trained models hold crucial domain-specific properties that are beneficial for adaptation. Hence, we propose to build a framework that supports disentanglement and learning o… ▽ More Conventional Domain Adaptation (DA) methods aim to learn domain-invariant feature representations to improve the target adaptation performance. However, we motivate that domain-specificity is equally important since in-domain trained models hold crucial domain-specific properties that are beneficial for adaptation. Hence, we propose to build a framework that supports disentanglement and learning of domain-specific factors and task-specific factors in a unified model. Motivated by the success of vision transformers in several multi-modal vision problems, we find that queries could be leveraged to extract the domain-specific factors. Hence, we propose a novel Domain-specificity-inducing Transformer (DSiT) framework for disentangling and learning both domain-specific and task-specific factors. To achieve disentanglement, we propose to construct novel Domain-Representative Inputs (DRI) with domain-specific information to train a domain classifier with a novel domain token. We are the first to utilize vision transformers for domain adaptation in a privacy-oriented source-free setting, and our approach achieves state-of-the-art performance on single-source, multi-source, and multi-target benchmarks △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: ICCV 2023. Project page: http://val.cds.iisc.ac.in/DSiT-SFDA

arXiv:2308.10638 [pdf, other]

SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

Authors: Soubhik Sanyal, Partha Ghosh, **long Yang, Michael J. Black, Justus Thies, Timo Bolkart

Abstract: We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-s… ▽ More We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator in an unsupervised way using the 2D image data. We use intermediate activations of the learned geometry model to condition our texture generator. To alleviate entanglement between pose and clothing type, and pose and clothing appearance, we condition both the texture and geometry generators with attribute labels such as clothing types for the geometry, and clothing colors for the texture generator. We automatically generated these conditioning labels for the 2D images based on the visual question answering model BLIP and CLIP. We validate our method on the SCULPT dataset, and compare to state-of-the-art 3D generative models for clothed human bodies. Our code and data can be found at https://sculpt.is.tue.mpg.de. △ Less

Submitted 6 May, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: Updated to camera ready version of CVPR 2024

arXiv:2307.11349 [pdf, other]

EV-Planner: Energy-Efficient Robot Navigation via Event-Based Physics-Guided Neuromorphic Planner

Authors: Sourav Sanyal, Rohan Kumar Manna, Kaushik Roy

Abstract: Vision-based object tracking is an essential precursor to performing autonomous aerial navigation in order to avoid obstacles. Biologically inspired neuromorphic event cameras are emerging as a powerful alternative to frame-based cameras, due to their ability to asynchronously detect varying intensities (even in poor lighting conditions), high dynamic range, and robustness to motion blur. Spiking… ▽ More Vision-based object tracking is an essential precursor to performing autonomous aerial navigation in order to avoid obstacles. Biologically inspired neuromorphic event cameras are emerging as a powerful alternative to frame-based cameras, due to their ability to asynchronously detect varying intensities (even in poor lighting conditions), high dynamic range, and robustness to motion blur. Spiking neural networks (SNNs) have gained traction for processing events asynchronously in an energy-efficient manner. On the other hand, physics-based artificial intelligence (AI) has gained prominence recently, as they enable embedding system knowledge via physical modeling inside traditional analog neural networks (ANNs). In this letter, we present an event-based physics-guided neuromorphic planner (EV-Planner) to perform obstacle avoidance using neuromorphic event cameras and physics-based AI. We consider the task of autonomous drone navigation where the mission is to detect moving gates and fly through them while avoiding a collision. We use event cameras to perform object detection using a shallow spiking neural network in an unsupervised fashion. Utilizing the physical equations of the brushless DC motors present in the drone rotors, we train a lightweight energy-aware physics-guided neural network (PgNN) with depth inputs. This predicts the optimal flight time responsible for generating near-minimum energy paths. We spawn the drone in the Gazebo simulator and implement a sensor-fused vision-to-planning neuro-symbolic framework using Robot Operating System (ROS). Simulation results for safe collision-free flight trajectories are presented with performance analysis, ablation study and potential future research directions △ Less

Submitted 3 January, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: accepted for publication at IEEE Robotics and Automation Letters

arXiv:2307.03900 [pdf, ps, other]

On the Composition of Randomized Query Complexity and Approximate Degree

Authors: Sourav Chakraborty, Chandrima Kayal, Rajat Mittal, Manaswi Paraashar, Swagato Sanyal, Nitin Saurabh

Abstract: For any Boolean functions $f$ and $g$, the question whether $R(f\circ g) = \tildeΘ(R(f)R(g))$, is known as the composition question for the randomized query complexity. Similarly, the composition question for the approximate degree asks whether $\widetilde{deg}(f\circ g) = \tildeΘ(\widetilde{deg}(f)\cdot\widetilde{deg}(g))$. These questions are two of the most important and well-studied problems,… ▽ More For any Boolean functions $f$ and $g$, the question whether $R(f\circ g) = \tildeΘ(R(f)R(g))$, is known as the composition question for the randomized query complexity. Similarly, the composition question for the approximate degree asks whether $\widetilde{deg}(f\circ g) = \tildeΘ(\widetilde{deg}(f)\cdot\widetilde{deg}(g))$. These questions are two of the most important and well-studied problems, and yet we are far from answering them satisfactorily. It is known that the measures compose if one assumes various properties of the outer function $f$ (or inner function $g$). This paper extends the class of outer functions for which $\text{R}$ and $\widetilde{\text{deg}}$ compose. A recent landmark result (Ben-David and Blais, 2020) showed that $R(f \circ g) = Ω(noisyR(f)\cdot R(g))$. This implies that composition holds whenever $noisyR(f) = \TildeΘ(R(f))$. We show two results: (1)When $R(f) = Θ(n)$, then $noisyR(f) = Θ(R(f))$. (2) If $\text{R}$ composes with respect to an outer function, then $\text{noisyR}$ also composes with respect to the same outer function. On the other hand, no result of the type $\widetilde{deg}(f \circ g) = Ω(M(f) \cdot \widetilde{deg}(g))$ (for some non-trivial complexity measure $M(\cdot)$) was known to the best of our knowledge. We prove that $\widetilde{deg}(f\circ g) = \widetildeΩ(\sqrt{bs(f)} \cdot \widetilde{deg}(g)),$ where $bs(f)$ is the block sensitivity of $f$. This implies that $\widetilde{\text{deg}}$ composes when $\widetilde{\text{deg}}(f)$ is asymptotically equal to $\sqrt{\text{bs}(f)}$. It is already known that both $\text{R}$ and $\widetilde{\text{deg}}$ compose when the outer function is symmetric. We also extend these results to weaker notions of symmetry with respect to the outer function. △ Less

Submitted 11 July, 2023; v1 submitted 8 July, 2023; originally announced July 2023.

arXiv:2306.03241 [pdf, other]

Early Weight Averaging meets High Learning Rates for LLM Pre-training

Authors: Sunny Sanyal, Atula Neerkaje, Jean Kaddour, Abhishek Kumar, Sujay Sanghavi

Abstract: Training Large Language Models (LLMs) incurs significant cost; hence, any strategy that accelerates model convergence is helpful. In this paper, we investigate the ability of a simple idea checkpoint averaging along the trajectory of a training run to improve both convergence and generalization quite early on during training. Here we show that models trained with high learning rates observe higher… ▽ More Training Large Language Models (LLMs) incurs significant cost; hence, any strategy that accelerates model convergence is helpful. In this paper, we investigate the ability of a simple idea checkpoint averaging along the trajectory of a training run to improve both convergence and generalization quite early on during training. Here we show that models trained with high learning rates observe higher gains due to checkpoint averaging. Furthermore, these gains are amplified when checkpoints are sampled with considerable spacing in training steps. Our training recipe outperforms conventional training and popular checkpoint averaging baselines such as exponential moving average (EMA) and stochastic moving average (SWA). We evaluate our training recipe by pre-training LLMs, where high learning rates are inherently preferred due to extremely large batch sizes. Specifically, we pre-trained nanoGPT-2 models of varying sizes, small (125M), medium (335M), and large (770M)on the OpenWebText dataset, comprised of 9B tokens. Additionally, we present results for publicly available Pythia LLMs, ranging from 1B to 12B, which were trained on the PILE-deduped dataset containing 207B tokens. △ Less

Submitted 11 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 17 pages, 13 figures, presented at NeurIPs 2023 WANT workshop

arXiv:2306.02680 [pdf, other]

BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

Authors: Ahana Deb, Sayan Nag, Ayan Mahapatra, Soumitri Chattopadhyay, Aritra Marik, Pijush Kanti Gayen, Shankha Sanyal, Archi Banerjee, Samir Karmakar

Abstract: Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful represent… ▽ More Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful representations from multilingual datasets, have performed well in speech tasks and are ideal to model specific tasks in low resource languages. Here, we develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts ($\underline{\textbf{Be}}$ngali speech acts recognition using Multimodal $\underline{\textbf{At}}$tention Fu$\underline{\textbf{s}}$ion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data. Project page: https://soumitri2001.github.io/BeAts △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Accepted at INTERSPEECH 2023

arXiv:2305.19472 [pdf, other]

PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

Authors: Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Ye** Choi

Abstract: Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using… ▽ More Procedural planning, which entails decomposing a high-level goal into a sequence of temporally ordered steps, is an important yet intricate task for machines. It involves integrating common-sense knowledge to reason about complex contextualized situations that are often counterfactual, e.g. "scheduling a doctor's appointment without a phone". While current approaches show encouraging results using large language models (LLMs), they are hindered by drawbacks such as costly API calls and reproducibility issues. In this paper, we advocate planning using smaller language models. We present PlaSma, a novel two-pronged approach to endow small language models with procedural knowledge and (counterfactual) planning capabilities. More concretely, we develop symbolic procedural knowledge distillation to enhance the implicit knowledge in small language models and an inference-time algorithm to facilitate more structured and accurate reasoning. In addition, we introduce a novel task, Counterfactual Planning, that requires a revision of a plan to cope with a counterfactual situation. In both the original and counterfactual setting, we show that orders-of-magnitude smaller models (770M-11B parameters) can compete and often surpass their larger teacher models' capabilities. △ Less

Submitted 26 July, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: cited new paper, 27 pages

arXiv:2305.18654 [pdf, other]

Faith and Fate: Limits of Transformers on Compositionality

Authors: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Ye** Choi

Abstract: Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the li… ▽ More Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the limits of these models across three representative compositional tasks -- multi-digit multiplication, logic grid puzzles, and a classic dynamic programming problem. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily develo** systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with\,increased\,task\,complexity. △ Less

Submitted 31 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 10 pages + appendix (40 pages)

arXiv:2305.02268 [pdf, other]

doi 10.1051/0004-6361/202346829

Cosmoglobe DR1 results. II. Constraints on isotropic cosmic birefringence from reprocessed WMAP and Planck LFI data

Authors: J. R. Eskilt, D. J. Watts, R. Aurlien, A. Basyrov, M. Bersanelli, M. Brilenkov, L. P. L. Colombo, H. K. Eriksen, K. S. F. Fornazier, C. Franceschet, U. Fuskeland, M. Galloway, E. Gjerløw, B. Hensley, L. T. Hergt, D. Herman, H. T. Ihle, K. Lee, J. G. S. Lunde, S. K. Nerval, S. Paradiso, S. K. Patel, F. Rahman, M. Regnier, M. San , et al. (6 additional authors not shown)

Abstract: Cosmic birefringence is a parity-violating effect that might have rotated the plane of linearly polarized light of the cosmic microwave background (CMB) by an angle $β$ since its emission. This has recently been measured to be non-zero at a statistical significance of $3.6σ$ in the official Planck PR4 and 9-year WMAP data. In this work, we constrain $β$ using the reprocessed BeyondPlanck LFI and C… ▽ More Cosmic birefringence is a parity-violating effect that might have rotated the plane of linearly polarized light of the cosmic microwave background (CMB) by an angle $β$ since its emission. This has recently been measured to be non-zero at a statistical significance of $3.6σ$ in the official Planck PR4 and 9-year WMAP data. In this work, we constrain $β$ using the reprocessed BeyondPlanck LFI and Cosmoglobe DR1 WMAP polarization maps. These novel maps have both lower systematic residuals and a more complete error description than the corresponding official products. Foreground $EB$ correlations could bias measurements of $β$, and while thermal dust $EB$ emission has been argued to be statistically non-zero, no evidence for synchrotron $EB$ power has been reported. Unlike the dust-dominated Planck HFI maps, the majority of the LFI and WMAP polarization maps are instead dominated by synchrotron emission. Simultaneously constraining $β$ and the polarization miscalibration angle, $α$, of each channel, we find a best-fit value of $β=0.35^{\circ}\pm0.70^{\circ}$ with LFI and WMAP data only. When including the Planck HFI PR4 maps, but fitting $β$ separately for dust-dominated, $β_{>70\,\mathrm{GHz}}$, and synchrotron-dominated channels, $β_{\leq 70\,\mathrm{GHz}}$, we find $β_{\leq 70\,\mathrm{GHz}}=0.53^{\circ}\pm0.28^\circ$. This differs from zero with a statistical significance of $1.9σ$, and the main contribution to this value comes from the LFI 70 GHz channel. While the statistical significances of these results are low on their own, the measurement derived from the LFI and WMAP synchrotron-dominated maps agrees with the previously reported HFI-dominated constraints, despite the very different astrophysical and instrumental systematics involved in all these experiments. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures, 2 tables. Submitted to A&A

Journal ref: A&A 679, A144 (2023)

arXiv:2304.07560 [pdf, other]

Continual Domain Adaptation through Pruning-aided Domain-specific Weight Modulation

Authors: Prasanna B, Sunandini Sanyal, R. Venkatesh Babu

Abstract: In this paper, we propose to develop a method to address unsupervised domain adaptation (UDA) in a practical setting of continual learning (CL). The goal is to update the model on continually changing domains while preserving domain-specific knowledge to prevent catastrophic forgetting of past-seen domains. To this end, we build a framework for preserving domain-specific features utilizing the inh… ▽ More In this paper, we propose to develop a method to address unsupervised domain adaptation (UDA) in a practical setting of continual learning (CL). The goal is to update the model on continually changing domains while preserving domain-specific knowledge to prevent catastrophic forgetting of past-seen domains. To this end, we build a framework for preserving domain-specific features utilizing the inherent model capacity via pruning. We also perform effective inference using a novel batch-norm based metric to predict the final model parameters to be used accurately. Our approach achieves not only state-of-the-art performance but also prevents catastrophic forgetting of past domains significantly. Our code is made publicly available. △ Less

Submitted 15 April, 2023; originally announced April 2023.

Comments: CVPR CLVision Workshop 2023, For code see https://github.com/PrasannaB29/PACDA

arXiv:2304.05593 [pdf]

Crack-free high composition (>35%) thick (>30 nm) barrier AlGaN/AlN/GaN HEMT on sapphire with record low sheet resistance

Authors: Swarnav Mukhopadhyay, Cheng Liu, Jiahao Chen, Surjava Sanyal, Ruixin Bai, Guangying Wang, Chirag Gupta, Shubhra Pasayat

Abstract: In this article, high composition (>35%) thick (>30 nm) barrier AlGaN/AlN/GaN HEMT structure grown on a sapphire substrate with ultra-low sheet resistivity (<250 Ω/ \Box ) is reported. Optimization of growth conditions, such as reduced growth rate, low carbon incorporation, and thickness optimization of different epitaxial layers allowed to grow a crack-free high composition and thick AlGaN barrie… ▽ More In this article, high composition (>35%) thick (>30 nm) barrier AlGaN/AlN/GaN HEMT structure grown on a sapphire substrate with ultra-low sheet resistivity (<250 Ω/ \Box ) is reported. Optimization of growth conditions, such as reduced growth rate, low carbon incorporation, and thickness optimization of different epitaxial layers allowed to grow a crack-free high composition and thick AlGaN barrier layer HEMT structure. A significantly high two-dimensional electron gas (2DEG) density of 1.46 \times 10^{13} cm^{-2} with a room temperature mobility of 1710 cm^{2}/V.s is obtained by Hall measurement using the Van-Der-Pauw method. These state-of-the-art results show great potential for high-power Ga-polar HEMT design on the sapphire substrate. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 14 pages, 7 main figures (Total=11 figures), 2 tables

arXiv:2303.08095 [pdf, other]

doi 10.1051/0004-6361/202346414

Cosmoglobe DR1 results. I. Improved Wilkinson Microwave Anisotropy Probe maps through Bayesian end-to-end analysis

Authors: D. J. Watts, A. Basyrov, J. R. Eskilt, M. Galloway, L. T. Hergt, D. Herman, H. T. Ihle, S. Paradiso, F. Rahman, H. Thommesen, R. Aurlien, M. Bersanelli, L. A. Bianchi, M. Brilenkov, L. P. L. Colombo, H. K. Eriksen, C. Franceschet, U. Fuskeland, E. Gjerløw, B. Hensley, G. A. Hoerning, K. Lee, J. G. S. Lunde, A. Marins, S. K. Nerval , et al. (8 additional authors not shown)

Abstract: We present Cosmoglobe Data Release 1, which implements the first joint analysis of WMAP and Planck LFI time-ordered data, processed within a single Bayesian end-to-end framework. This framework builds directly on a similar analysis of the LFI measurements by the BeyondPlanck collaboration, and approaches the CMB analysis challenge through Gibbs sampling of a global posterior distribution, simultan… ▽ More We present Cosmoglobe Data Release 1, which implements the first joint analysis of WMAP and Planck LFI time-ordered data, processed within a single Bayesian end-to-end framework. This framework builds directly on a similar analysis of the LFI measurements by the BeyondPlanck collaboration, and approaches the CMB analysis challenge through Gibbs sampling of a global posterior distribution, simultaneously accounting for calibration, mapmaking, and component separation. The computational cost of producing one complete WMAP+LFI Gibbs sample is 812 CPU-hr, of which 603 CPU-hrs are spent on WMAP low-level processing; this demonstrates that end-to-end Bayesian analysis of the WMAP data is computationally feasible. We find that our WMAP posterior mean temperature sky maps and CMB temperature power spectrum are largely consistent with the official WMAP9 results. Perhaps the most notable difference is that our CMB dipole amplitude is $3366.2 \pm 1.4\ \mathrm{μK}$, which is $11\ \mathrm{μK}$ higher than the WMAP9 estimate and $2.5\ σ$ higher than BeyondPlanck; however, it is in perfect agreement with the HFI-dominated Planck PR4 result. In contrast, our WMAP polarization maps differ more notably from the WMAP9 results, and in general exhibit significantly lower large-scale residuals. We attribute this to a better constrained gain and transmission imbalance model. It is particularly noteworthy that the W-band polarization sky map, which was excluded from the official WMAP cosmological analysis, for the first time appears visually consistent with the V-band sky map. Similarly, the long standing discrepancy between the WMAP K-band and LFI 30 GHz maps is finally resolved, and the difference between the two maps appears consistent with instrumental noise at high Galactic latitudes. All maps and the associated code are made publicly available through the Cosmoglobe web page. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: 65 pages, 61 figures. Data available at cosmoglobe.uio.no. Submitted to A&A

Journal ref: A&A 679, A143 (2023)

arXiv:2302.14053 [pdf, other]

Emergent scale and anomalous dynamics in certain quasi-periodic systems

Authors: Parvathy S Nair, Dintomon Joy, Sambuddha Sanyal

Abstract: We study localisation transition in a class of quasi-periodic systems that has two competing periodic scales. We show that such class of systems show a re-entrant localisation transition where the energy scale of transition is set by the periodicities of these two scales. Furthermore we show dynamical properties in these systems, exhibits various kinds critical dynamics including sub-diffusive, su… ▽ More We study localisation transition in a class of quasi-periodic systems that has two competing periodic scales. We show that such class of systems show a re-entrant localisation transition where the energy scale of transition is set by the periodicities of these two scales. Furthermore we show dynamical properties in these systems, exhibits various kinds critical dynamics including sub-diffusive, super-diffusive and diffusive spread of an initially localised wave-packet. Finally we show that these characteristics of quasi-periodic systems with two periodic scales can be realised within the regime of current experiments. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 5 pages, 5 figures in the main text, 3 pages, 6 figures in the supplementary material

arXiv:2302.05151 [pdf, ps, other]

doi 10.1134/S1063773722110056

Binomial Line Cox Processes: Statistical Characterization and Applications in Wireless Network Analysis

Authors: Mohammad Taha Shah, Gourab Ghatak, Souradip Sanyal, Martin Haenggi

Abstract: The current analysis of wireless networks whose transceivers are confined to streets is largely based on Poissonian models, such as Poisson line processes and Poisson line Cox processes. We demonstrate important scenarios where a model with a finite and deterministic number of streets, termed binomial line process, is more accurate. We characterize the statistical properties of the BLP and the cor… ▽ More The current analysis of wireless networks whose transceivers are confined to streets is largely based on Poissonian models, such as Poisson line processes and Poisson line Cox processes. We demonstrate important scenarios where a model with a finite and deterministic number of streets, termed binomial line process, is more accurate. We characterize the statistical properties of the BLP and the corresponding binomial line Cox process and apply them to analyze the performance of a network whose access points are deployed along the streets of a city. Such a deployment scenario will be typical for 5G and future wireless networks. In order to obtain a fine-grained insight into the network performance, we derive the meta distribution of the signal-to-interference and noise ratio. Accordingly, we investigate the mean local delay in transmission and the density of successful transmission. These metrics, respectively, characterize the latency and coverage performance of the network and are key performance indicators of next-generation wireless systems. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Submitted to IEEE Transactions on Wireless Communications

arXiv:2301.12059 [pdf, ps, other]

Potential energy surface prediction of Alumina polymorphs using graph neural network

Authors: Soumya Sanyal, Arun Kumar Sagotra, Narendra Kumar, Sharad Rathi, Mohana Krishna, Nagesh Somayajula, Duraivelan Palanisamy, Ram R. Ratnakar, Suchismita Sanyal, Partha Talukdar, Umesh Waghmare, Janakiraman Balachandran

Abstract: The process of design and discovery of new materials can be significantly expedited and simplified if we can learn effectively from available data. Deep learning (DL) approaches have recently received a lot of interest for their ability to speed up the design of novel materials by predicting material properties with precision close to experiments and ab-initio calculations. The application of deep… ▽ More The process of design and discovery of new materials can be significantly expedited and simplified if we can learn effectively from available data. Deep learning (DL) approaches have recently received a lot of interest for their ability to speed up the design of novel materials by predicting material properties with precision close to experiments and ab-initio calculations. The application of deep learning to predict materials properties measured by experiments are valuable yet challenging due to the limited amount of experimental data. Most of the existing approaches to predict properties from computational data have also been directed towards specific material properties. In this work, we extend this approach, by proposing Landscape Crystal Graph Convolution Network(LCGCN), an accurate and transferable deep learning framework based on graph convolutional networks. LCGCN directly learns the potential energy surface (PES) from atomic configurations. This approach can enable transferable models that can predict different material properties. We apply this framework to bulk crystals (i.e. Al2O3), and test it by calculating potential energy surfaces at different temperatures and across different phases of crystal. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2212.09282 [pdf, other]

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Authors: Soumya Sanyal, Yichong Xu, Shuohang Wang, Ziyi Yang, Reid Pryzant, Wenhao Yu, Chenguang Zhu, Xiang Ren

Abstract: Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation… ▽ More Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions. Prior works on improving the logical reasoning ability of language models require complex processing of training data (e.g., aligning symbolic knowledge to text), yielding task-specific data augmentation solutions that restrict the learning of general logical reasoning skills. In this work, we propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia, based on a set of logical inference keywords, for continued pretraining of a language model. We use two self-supervised loss functions: a modified masked language modeling loss where only specific parts-of-speech words, that would likely require more reasoning than basic language understanding, are masked, and a sentence-level classification loss that teaches the model to distinguish between entailment and contradiction types of sentences. The proposed training paradigm is both simple and independent of task formats. We demonstrate the effectiveness of APOLLO by comparing it with prior baselines on two logical reasoning datasets. APOLLO performs comparably on ReClor and outperforms baselines on LogiQA. The code base has been made publicly available. △ Less

Submitted 4 June, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Accepted at ACL 2023, code available at https://github.com/INK-USC/APOLLO

arXiv:2211.17214 [pdf, ps, other]

doi 10.4230/LIPIcs.ITCS.2023.33

Lifting to Parity Decision Trees Via Stifling

Authors: Arkadev Chattopadhyay, Nikhil S. Mande, Swagato Sanyal, Suhail Sherif

Abstract: We show that the deterministic decision tree complexity of a (partial) function or relation $f$ lifts to the deterministic parity decision tree (PDT) size complexity of the composed function/relation $f \circ g$ as long as the gadget $g$ satisfies a property that we call stifling. We observe that several simple gadgets of constant size, like Indexing on 3 input bits, Inner Product on 4 input bits,… ▽ More We show that the deterministic decision tree complexity of a (partial) function or relation $f$ lifts to the deterministic parity decision tree (PDT) size complexity of the composed function/relation $f \circ g$ as long as the gadget $g$ satisfies a property that we call stifling. We observe that several simple gadgets of constant size, like Indexing on 3 input bits, Inner Product on 4 input bits, Majority on 3 input bits and random functions, satisfy this property. It can be shown that existing randomized communication lifting theorems ([Göös, Pitassi, Watson. SICOMP'20], [Chattopadhyay et al. SICOMP'21]) imply PDT-size lifting. However there are two shortcomings of this approach: first they lift randomized decision tree complexity of $f$, which could be exponentially smaller than its deterministic counterpart when either $f$ is a partial function or even a total search problem. Second, the size of the gadgets in such lifting theorems are as large as logarithmic in the size of the input to $f$. Reducing the gadget size to a constant is an important open problem at the frontier of current research. Our result shows that even a random constant-size gadget does enable lifting to PDT size. Further, it also yields the first systematic way of turning lower bounds on the width of tree-like resolution proofs of the unsatisfiability of constant-width CNF formulas to lower bounds on the size of tree-like proofs in the resolution with parity system, i.e., $\textit{Res}$($\oplus$), of the unsatisfiability of closely related constant-width CNF formulas. △ Less

Submitted 30 November, 2022; originally announced November 2022.

arXiv:2210.17164 [pdf, other]

doi 10.3847/1538-4357/acb4ef

Magnetic reconnection in the wakes of cosmic strings

Authors: Dilip Kumar, Soma Sanyal

Abstract: The motion of cosmic strings in the universe leads to the generation of wakes behind them. We study magnetized wakes of cosmic strings moving in the post recombination plasma. We show that magnetic reconnection can occur in the post shock region. Since the width of the cosmic string wake is very small, the reconnection occurs over a very short lengthscale. The reconnection leads to a large amount… ▽ More The motion of cosmic strings in the universe leads to the generation of wakes behind them. We study magnetized wakes of cosmic strings moving in the post recombination plasma. We show that magnetic reconnection can occur in the post shock region. Since the width of the cosmic string wake is very small, the reconnection occurs over a very short lengthscale. The reconnection leads to a large amount of kinetic energy being released in the post shock region of the cosmic string wake. This enhances the kinetic energy released during the reconnection. We make a rudimentary estimate of the kinetic energy released by the magnetic reconnection in cosmic strings wakes and show that it can account for low energy Gamma Ray Bursts (GRB) in the post recombination era. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: 11 pages 1 figure

arXiv:2210.00012 [pdf, other]

doi 10.1103/PhysRevLett.132.016701

Unidirectional subsystem symmetry in a hole-doped honeycomb-lattice Ising magnet

Authors: Sambuddha Sanyal, Alexander Wietek, John Sous

Abstract: We study a model of a hole-doped collinear Ising antiferromagnet on the honeycomb lattice as a route toward the realization of subsystem symmetry. We find nearly exact conservation of dipole symmetry verified both numerically with exact diagonalization (ED) on finite clusters and analytically with perturbation theory. The emergent symmetry forbids the motion of single holes -- or fractons -- but a… ▽ More We study a model of a hole-doped collinear Ising antiferromagnet on the honeycomb lattice as a route toward the realization of subsystem symmetry. We find nearly exact conservation of dipole symmetry verified both numerically with exact diagonalization (ED) on finite clusters and analytically with perturbation theory. The emergent symmetry forbids the motion of single holes -- or fractons -- but allows hole pairs -- or dipoles -- to move freely along a one-dimensional line, the antiferromagnetic direction, of the system; in the transverse direction, both fractons and dipoles are completely localized. This presents a realization of a `unidirectional' subsystem symmetry. By studying interactions between dipoles, we argue that the subsystem symmetry is likely to continue to persist up to finite (but probably small) hole concentrations. △ Less

Submitted 30 September, 2022; originally announced October 2022.

Comments: 5 pages, 4 figures

Journal ref: Phys. Rev. Lett. 132, 016701 (2024)

arXiv:2209.10063 [pdf, other]

Generate rather than Retrieve: Large Language Models are Strong Context Generators

Authors: Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang

Abstract: Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents.… ▽ More Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first retrieves a handful of relevant contextual documents from an external corpus such as Wikipedia and then predicts an answer conditioned on the retrieved documents. In this paper, we present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators. We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer. Furthermore, we propose a novel clustering-based prompting method that selects distinct prompts, resulting in the generated documents that cover different perspectives, leading to better recall over acceptable answers. We conduct extensive experiments on three different knowledge-intensive tasks, including open-domain QA, fact checking, and dialogue system. Notably, GenRead achieves 71.6 and 54.4 exact match scores on TriviaQA and WebQ, significantly outperforming the state-of-the-art retrieve-then-read pipeline DPR-FiD by +4.0 and +3.9, without retrieving any documents from any external knowledge source. Lastly, we demonstrate the model performance can be further improved by combining retrieval and generation. Our code and generated documents can be found at https://github.com/wyu97/GenRead. △ Less

Submitted 25 January, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: Accepted at ICLR 2023 (v3, add code and implementation details)

arXiv:2209.09025 [pdf, other]

RAMP-Net: A Robust Adaptive MPC for Quadrotors via Physics-informed Neural Network

Authors: Sourav Sanyal, Kaushik Roy

Abstract: Model Predictive Control (MPC) is a state-of-the-art (SOTA) control technique which requires solving hard constrained optimization problems iteratively. For uncertain dynamics, analytical model based robust MPC imposes additional constraints, increasing the hardness of the problem. The problem exacerbates in performance-critical applications, when more compute is required in lesser time. Data-driv… ▽ More Model Predictive Control (MPC) is a state-of-the-art (SOTA) control technique which requires solving hard constrained optimization problems iteratively. For uncertain dynamics, analytical model based robust MPC imposes additional constraints, increasing the hardness of the problem. The problem exacerbates in performance-critical applications, when more compute is required in lesser time. Data-driven regression methods such as Neural Networks have been proposed in the past to approximate system dynamics. However, such models rely on high volumes of labeled data, in the absence of symbolic analytical priors. This incurs non-trivial training overheads. Physics-informed Neural Networks (PINNs) have gained traction for approximating non-linear system of ordinary differential equations (ODEs), with reasonable accuracy. In this work, we propose a Robust Adaptive MPC framework via PINNs (RAMP-Net), which uses a neural network trained partly from simple ODEs and partly from data. A physics loss is used to learn simple ODEs representing ideal dynamics. Having access to analytical functions inside the loss function acts as a regularizer, enforcing robust behavior for parametric uncertainties. On the other hand, a regular data loss is used for adapting to residual disturbances (non-parametric uncertainties), unaccounted during mathematical modelling. Experiments are performed in a simulated environment for trajectory tracking of a quadrotor. We report 7.8% to 43.2% and 8.04% to 61.5% reduction in tracking errors for speeds ranging from 0.5 to 1.75 m/s compared to two SOTA regression based MPC methods. △ Less

Submitted 24 February, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: This work has been accepted for presentation at the 2023 IEEE International Conference on Robotics and Automation (ICRA), May 29 - June 2, 2023, London, UK. arXiv version will be merged with the conference proceeding once available

arXiv:2209.08042 [pdf, other]

Decision Tree Complexity versus Block Sensitivity and Degree

Authors: Rahul Chugh, Supartha Podder, Swagato Sanyal

Abstract: Relations between the decision tree complexity and various other complexity measures of Boolean functions is a thriving topic of research in computational complexity. It is known that decision tree complexity is bounded above by the cube of block sensitivity, and the cube of polynomial degree. However, the widest separation between decision tree complexity and each of block sensitivity and degree… ▽ More Relations between the decision tree complexity and various other complexity measures of Boolean functions is a thriving topic of research in computational complexity. It is known that decision tree complexity is bounded above by the cube of block sensitivity, and the cube of polynomial degree. However, the widest separation between decision tree complexity and each of block sensitivity and degree that is witnessed by known Boolean functions is quadratic. In this work, we investigate the tightness of the existing cubic upper bounds. We improve the cubic upper bounds for many interesting classes of Boolean functions. We show that for graph properties and for functions with a constant number of alternations, both of the cubic upper bounds can be improved to quadratic. We define a class of Boolean functions, which we call the zebra functions, that comprises Boolean functions where each monotone path from 0^n to 1^n has an equal number of alternations. This class contains the symmetric and monotone functions as its subclasses. We show that for any zebra function, decision tree complexity is at most the square of block sensitivity, and certificate complexity is at most the square of degree. Finally, we show using a lifting theorem of communication complexity by G{ö}{ö}s, Pitassi and Watson that the task of proving an improved upper bound on the decision tree complexity for all functions is in a sense equivalent to the potentially easier task of proving a similar upper bound on communication complexity for each bi-partition of the input variables, for all functions. In particular, this implies that to bound the decision tree complexity it suffices to bound smaller measures like parity decision tree complexity, subcube decision tree complexity and decision tree rank, that are defined in terms of models that can be efficiently simulated by communication protocols. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2205.12598 [pdf, other]

RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning

Authors: Soumya Sanyal, Zeyi Liao, Xiang Ren

Abstract: Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets… ▽ More Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets that evaluate the robustness of these models to minimal logical edits in rulebases and some standard logical equivalence conditions. In our experiments with RoBERTa and T5, we find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR, thus showing that the models are not robust to the proposed logical perturbations. Further, we find that the models find it especially hard to learn logical negation and disjunction operators. Overall, using our evaluation sets, we demonstrate some shortcomings of the deductive reasoning-based language models, which can eventually help towards designing better models for logical reasoning over natural language. All the datasets and code base have been made publicly available. △ Less

Submitted 8 November, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Accpeted at EMNLP 2022, code available at https://github.com/INK-USC/RobustLR

arXiv:2204.13303 [pdf, other]

doi 10.1016/j.astropartphys.2022.102805

Evolution of magnetic fields in cosmic string wakes

Authors: Soumen Nayak, Sovan Sau, Soma Sanyal

Abstract: We study the evolution of magnetic fields in cosmic string wakes in a plasma with a low resistivity. The initial magnetic field in the wake is modelled on the magnetic fields that are generated by the motion of particles around cosmic strings. The plasma is characterized by a high beta value. We find multiple shock like structures develo** in the wake of the string. We study the detailed structu… ▽ More We study the evolution of magnetic fields in cosmic string wakes in a plasma with a low resistivity. The initial magnetic field in the wake is modelled on the magnetic fields that are generated by the motion of particles around cosmic strings. The plasma is characterized by a high beta value. We find multiple shock like structures develo** in the wake of the string. We study the detailed structure of the shocks formed and the evolution of the magnetic field in the shock using a 2-D magnetohydrodynamic simulation. As expected, the development of the magnetic field does not depend on the $β$ value. Our results show that instead of a singe uniform shock forming behind the cosmic string we have multiple shocks forming at short time intervals behind the string. The presence of multiple shocks will definitely affect the observational signatures of cosmic string wakes as these signatures depend upon the temperature fluctuations generated by the shock. We also find that as the shock moves away, the residual magnetic field left behind reconnects and dissipates rapidly. The magnetic field around the string is thus very localized. We find that magnetic field reconnections take place in cosmic string wakes. This leads to the decrease of the magnetic field in the post shock region. △ Less

Submitted 21 October, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: 17 pages 7 figures New graphs have been provided

arXiv:2204.11022 [pdf, other]

Towards Data-Free Model Stealing in a Hard Label Setting

Authors: Sunandini Sanyal, Sravanti Addepalli, R. Venkatesh Babu

Abstract: Machine learning models deployed as a service (MLaaS) are susceptible to model stealing attacks, where an adversary attempts to steal the model within a restricted access framework. While existing attacks demonstrate near-perfect clone-model performance using softmax predictions of the classification network, most of the APIs allow access to only the top-1 labels. In this work, we show that it is… ▽ More Machine learning models deployed as a service (MLaaS) are susceptible to model stealing attacks, where an adversary attempts to steal the model within a restricted access framework. While existing attacks demonstrate near-perfect clone-model performance using softmax predictions of the classification network, most of the APIs allow access to only the top-1 labels. In this work, we show that it is indeed possible to steal Machine Learning models by accessing only top-1 predictions (Hard Label setting) as well, without access to model gradients (Black-Box setting) or even the training dataset (Data-Free setting) within a low query budget. We propose a novel GAN-based framework that trains the student and generator in tandem to steal the model effectively while overcoming the challenge of the hard label setting by utilizing gradients of the clone network as a proxy to the victim's gradients. We propose to overcome the large query costs associated with a typical Data-Free setting by utilizing publicly available (potentially unrelated) datasets as a weak image prior. We additionally show that even in the absence of such data, it is possible to achieve state-of-the-art results within a low query budget using synthetically crafted samples. We are the first to demonstrate the scalability of Model Stealing in a restricted access setting on a 100 class dataset as well. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: CVPR 2022, Project Page: https://sites.google.com/view/dfms-hl

arXiv:2204.01293 [pdf, ps, other]

doi 10.3847/1538-4357/ac62ce

A study of photoionized gas in two HII regions of the N44 complex in the LMC using MUSE observations

Authors: Susmita Barman, Naslim Neelamkodan, Suzanne C. Madden, Marta Sewilo, Francisca Kemper, Kazuki Tokuda, Soma Sanyal, Toshikazu Onishi

Abstract: We use the optical integral field observations with Multi-Unit Spectroscopic Explorer (MUSE) on the Very Large Telescope, together with CLOUDY photoionization models to study ionization structure and physical conditions of two luminous HII regions in N44 star-forming complex of the Large Magellanic Cloud. The spectral maps of various emission lines reveal a stratified ionization geometry in N44 D1… ▽ More We use the optical integral field observations with Multi-Unit Spectroscopic Explorer (MUSE) on the Very Large Telescope, together with CLOUDY photoionization models to study ionization structure and physical conditions of two luminous HII regions in N44 star-forming complex of the Large Magellanic Cloud. The spectral maps of various emission lines reveal a stratified ionization geometry in N44 D1. The spatial distribution of [O I] 6300A emission in N44 D1 indicates a partially covered ionization front at the outer boundary of the H II region. These observations reveal that N44 D1 is a Blister HII region. The [O I] 6300A emission in N44 C does not provide a well-defined ionization front at the boundary, while patches of [S II] 6717 A and [O I] 6300A emission bars are found in the interior. The results of spatially resolved MUSE spectra are tested with the photoionization models for the first time in these HII regions. A spherically symmetric ionization-bounded model with a partial covering factor, which is appropriate for a Blister HII region can well reproduce the observed geometry and most of the diagnostic line ratios in N44 D1. Similarly, in N44 C we apply a low density and optically thin model based on the observational signatures. Our modeling results show that the ionization structure and physical conditions of N44 D1 are mainly determined by the radiation from an O5 V star. However, local X-rays, possibly from supernovae or stellar wind, play a key role. In N44 C, the main contribution is from three ionizing stars. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: Accepted for publication in ApJ

arXiv:2203.15433 [pdf, other]

doi 10.1103/PhysRevC.106.014901

Machine Learning model driven prediction of the initial geometry in Heavy-Ion Collision experiments

Authors: Abhisek Saha, Debasis Dan, Soma Sanyal

Abstract: We demonstrate high prediction accuracy of three important properties that determine the initial geometry of the heavy-ion collision (HIC) experiments by using supervised Machine Learning (ML) methods. These properties are the impact parameter, the eccentricity and the participant eccentricity. Though ML techniques have been used previously to determine the impact parameter of these collisions, we… ▽ More We demonstrate high prediction accuracy of three important properties that determine the initial geometry of the heavy-ion collision (HIC) experiments by using supervised Machine Learning (ML) methods. These properties are the impact parameter, the eccentricity and the participant eccentricity. Though ML techniques have been used previously to determine the impact parameter of these collisions, we study multiple ML algorithms, their error spectrum, and sampling methods using exhaustive parameter scans and ablation studies to determine a combination of efficient algorithm and tuned training set that gives multi-fold improvement in accuracy for all three different heavy-ion collision models. The three models chosen are a transport model, a hydrodynamic model and a hybrid model. The motivation of using three different heavy-ion collision models was to show that even if the model is trained using a transport model, it gives accurate results for a hydrodynamic model as well as a hybrid model. We show that the accuracy of the impact parameter prediction depends on the centrality of the collision. With the standard application of ML training methods, prediction accuracy is considerable low for central collisions. Our method increases this accuracy by multiple folds. We also show that the eccentricity prediction accuracy can be improved by inclusion of the impact parameter as a feature in all these algorithms. We discuss how the errors can be minimized and the accuracy can be improved to a great extent in all the ranges of impact parameter and eccentricity predictions. △ Less

Submitted 22 November, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: 31 pages, 13 figures

Journal ref: Phys. Rev. C 106, 014901 (2022)

arXiv:2203.10261 [pdf, other]

FaiRR: Faithful and Robust Deductive Reasoning over Natural Language

Authors: Soumya Sanyal, Harman Singh, Xiang Ren

Abstract: Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in natural language. Recent works show that such models can also produce the reasoning steps (i.e., the proof graph) that emulate the model's logical reasoning process. Currently, these black-box models generate both the proof graph and intermediate inferences within… ▽ More Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in natural language. Recent works show that such models can also produce the reasoning steps (i.e., the proof graph) that emulate the model's logical reasoning process. Currently, these black-box models generate both the proof graph and intermediate inferences within the same model and thus may be unfaithful. In this work, we frame the deductive logical reasoning task by defining three modular components: rule selection, fact selection, and knowledge composition. The rule and fact selection steps select the candidate rule and facts to be used and then the knowledge composition combines them to generate new inferences. This ensures model faithfulness by assured causal relation from the proof step to the inference reasoning. To test our framework, we propose FaiRR (Faithful and Robust Reasoner) where the above three components are independently modeled by transformers. We observe that FaiRR is robust to novel language perturbations, and is faster at inference than previous works on existing reasoning datasets. Additionally, in contrast to black-box generative models, the errors made by FaiRR are more interpretable due to the modular approach. △ Less

Submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted in ACL 2022

arXiv:2203.00083 [pdf, ps, other]

Sampling-Based Winner Prediction in District-Based Elections

Authors: Palash Dey, Debajyoti Kar, Swagato Sanyal

Abstract: In a district-based election, we apply a voting rule $r$ to decide the winners in each district, and a candidate who wins in a maximum number of districts is the winner of the election. We present efficient sampling-based algorithms to predict the winner of such district-based election systems in this paper. When $r$ is plurality and the margin of victory is known to be at least $\varepsilon$ frac… ▽ More In a district-based election, we apply a voting rule $r$ to decide the winners in each district, and a candidate who wins in a maximum number of districts is the winner of the election. We present efficient sampling-based algorithms to predict the winner of such district-based election systems in this paper. When $r$ is plurality and the margin of victory is known to be at least $\varepsilon$ fraction of the total population, we present an algorithm to predict the winner. The sample complexity of our algorithm is $\mathcal{O}\left(\frac{1}{\varepsilon^4}\log \frac{1}{\varepsilon}\log\frac{1}δ\right)$. We complement this result by proving that any algorithm, from a natural class of algorithms, for predicting the winner in a district-based election when $r$ is plurality, must sample at least $Ω\left(\frac{1}{\varepsilon^4}\log\frac{1}δ\right)$ votes. We then extend this result to any voting rule $r$. Loosely speaking, we show that we can predict the winner of a district-based election with an extra overhead of $\mathcal{O}\left(\frac{1}{\varepsilon^2}\log\frac{1}δ\right)$ over the sample complexity of predicting the single-district winner under $r$. We further extend our algorithm for the case when the margin of victory is unknown, but we have only two candidates. We then consider the median voting rule when the set of preferences in each district is single-peaked. We show that the winner of a district-based election can be predicted with $\mathcal{O}\left(\frac{1}{\varepsilon^4}\log\frac{1}{\varepsilon}\log\frac{1}δ\right)$ samples even when the harmonious order in different districts can be different and even unknown. Finally, we also show some results for estimating the margin of victory of a district-based election within both additive and multiplicative error bounds. △ Less

Submitted 28 February, 2022; originally announced March 2022.

Comments: 27 pages

arXiv:2112.09609 [pdf, ps, other]

doi 10.1140/epjp/s13360-021-02232-y

Inflation with F(T) Teleparallel Gravity

Authors: Manas Chakrabortty, Nayem Sk, Susmita Sanyal, Abhik Kumar Sanyal

Abstract: We study early universe with a particular form of F(T) Telleparallel gravity theory, in which inflation is driven by a scalar field. To ensure slow rollover, two different potentials are chosen in a manner, such that they remain almost flat for large initial value of the scalar field. Inflationary parameters show wonderful fit with the presently available Planck's data set. The energy scale of inf… ▽ More We study early universe with a particular form of F(T) Telleparallel gravity theory, in which inflation is driven by a scalar field. To ensure slow rollover, two different potentials are chosen in a manner, such that they remain almost flat for large initial value of the scalar field. Inflationary parameters show wonderful fit with the presently available Planck's data set. The energy scale of inflation is sub-Planckian and graceful exit from inflation is also administered. The chosen form of F(T) administers late-time cosmic acceleration too. In the process, unification of the early inflation with late-time acceleration is ensured. Unfortunately, a decelerated radiation dominated era is only possible with a different form of (quartic) potential, which being devoid of a flat section does not admit slow rollover. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: 21 pages, zero figures

Journal ref: Euro.Phys.J.Plus, 2021

arXiv:2110.05458 [pdf, other]

Learning Realistic Human Reposing using Cyclic Self-Supervision with 3D Shape, Pose, and Appearance Consistency

Authors: Soubhik Sanyal, Alex Vorobiov, Timo Bolkart, Matthew Loper, Betty Mohler, Larry Davis, Javier Romero, Michael J. Black

Abstract: Synthesizing images of a person in novel poses from a single image is a highly ambiguous task. Most existing approaches require paired training images; i.e. images of the same person with the same clothing in different poses. However, obtaining sufficiently large datasets with paired data is challenging and costly. Previous methods that forego paired supervision lack realism. We propose a self-sup… ▽ More Synthesizing images of a person in novel poses from a single image is a highly ambiguous task. Most existing approaches require paired training images; i.e. images of the same person with the same clothing in different poses. However, obtaining sufficiently large datasets with paired data is challenging and costly. Previous methods that forego paired supervision lack realism. We propose a self-supervised framework named SPICE (Self-supervised Person Image CrEation) that closes the image quality gap with supervised methods. The key insight enabling self-supervision is to exploit 3D information about the human body in several ways. First, the 3D body shape must remain unchanged when reposing. Second, representing body pose in 3D enables reasoning about self occlusions. Third, 3D body parts that are visible before and after reposing, should have similar appearance features. Once trained, SPICE takes an image of a person and generates a new image of that person in a new target pose. SPICE achieves state-of-the-art performance on the DeepFashion dataset, improving the FID score from 29.9 to 7.8 compared with previous unsupervised methods, and with performance similar to the state-of-the-art supervised method (6.4). SPICE also generates temporally coherent videos given an input image and a sequence of poses, despite being trained on static images only. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: International Conference on Computer Vision (ICCV)

arXiv:2109.04554 [pdf, other]

Feature-based Individual Fairness in k-Clustering

Authors: Debajyoti Kar, Mert Kosan, Debmalya Mandal, Sourav Medya, Arlei Silva, Palash Dey, Swagato Sanyal

Abstract: Ensuring fairness in machine learning algorithms is a challenging and essential task. We consider the problem of clustering a set of points while satisfying fairness constraints. While there have been several attempts to capture group fairness in the $k$-clustering problem, fairness at an individual level is relatively less explored. We introduce a new notion of individual fairness in $k$-clusteri… ▽ More Ensuring fairness in machine learning algorithms is a challenging and essential task. We consider the problem of clustering a set of points while satisfying fairness constraints. While there have been several attempts to capture group fairness in the $k$-clustering problem, fairness at an individual level is relatively less explored. We introduce a new notion of individual fairness in $k$-clustering based on features not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. Therefore, we design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective and individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results against six competing baselines validate that our algorithm produces individually fairer clusters than the fairest baseline by 12.5% on average while also being less costly in terms of the clustering objective than the best baseline by 34.5% on average. △ Less

Submitted 3 February, 2023; v1 submitted 9 September, 2021; originally announced September 2021.

Showing 1–50 of 223 results for author: Sanyal, S