Search | arXiv e-print repository

arXiv:2407.00950 [pdf, other]

Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

Authors: Ziyi Liu, Idan Attias, Daniel M. Roy

Abstract: In this work, we investigate the problem of adapting to the presence or absence of causal structure in multi-armed bandit problems. In addition to the usual reward signal, we assume the learner has access to additional variables, observed in each round after acting. When these variables $d$-separate the action from the reward, existing work in causal bandits demonstrates that one can achieve stric… ▽ More In this work, we investigate the problem of adapting to the presence or absence of causal structure in multi-armed bandit problems. In addition to the usual reward signal, we assume the learner has access to additional variables, observed in each round after acting. When these variables $d$-separate the action from the reward, existing work in causal bandits demonstrates that one can achieve strictly better (minimax) rates of regret (Lu et al., 2020). Our goal is to adapt to this favorable "conditionally benign" structure, if it is present in the environment, while simultaneously recovering worst-case minimax regret, if it is not. Notably, the learner has no prior knowledge of whether the favorable structure holds. In this paper, we establish the Pareto optimal frontier of adaptive rates. We prove upper and matching lower bounds on the possible trade-offs in the performance of learning in conditionally benign and arbitrary environments, resolving an open question raised by Bilodeau et al. (2022). Furthermore, we are the first to obtain instance-dependent bounds for causal bandits, by reducing the problem to the linear bandit setting. Finally, we examine the common assumption that the marginal distributions of the post-action contexts are known and show that a nontrivial estimate is necessary for better-than-worst-case minimax rates. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted to ICML 2024

arXiv:2404.06498 [pdf, other]

Simultaneous linear connectivity of neural networks modulo permutation

Authors: Ekansh Sharma, Devin Kwok, Tom Denton, Daniel M. Roy, David Rolnick, Gintare Karolina Dziugaite

Abstract: Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between traine… ▽ More Neural networks typically exhibit permutation symmetries which contribute to the non-convexity of the networks' loss landscapes, since linearly interpolating between two permuted versions of a trained network tends to encounter a high loss barrier. Recent work has argued that permutation symmetries are the only sources of non-convexity, meaning there are essentially no such barriers between trained networks if they are permuted appropriately. In this work, we refine these arguments into three distinct claims of increasing strength. We show that existing evidence only supports "weak linear connectivity"-that for each pair of networks belonging to a set of SGD solutions, there exist (multiple) permutations that linearly connect it with the other networks. In contrast, the claim "strong linear connectivity"-that for each network, there exists one permutation that simultaneously connects it with the other networks-is both intuitively and practically more desirable. This stronger claim would imply that the loss landscape is convex after accounting for permutation, and enable linear interpolation between three or more independently trained models without increased loss. In this work, we introduce an intermediate claim-that for certain sequences of networks, there exists one permutation that simultaneously aligns matching pairs of networks from these sequences. Specifically, we discover that a single permutation aligns sequences of iteratively trained as well as iteratively pruned networks, meaning that two networks exhibit low loss barriers at each step of their optimization and sparsification trajectories respectively. Finally, we provide the first evidence that strong linear connectivity may be possible under certain conditions, by showing that barriers decrease with increasing network width when interpolating among three networks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 11 pages, 6 figures

arXiv:2404.03104 [pdf, other]

Quotient-saturated groups

Authors: Jordi Delgado, Mallika Roy, Enric Ventura

Abstract: We introduce the new notion of quotient-saturation as a measure of the immensity of the quotient structure of a group. We present a sufficient condition for a finitely presented group to be quotient-saturated, and use it to deduce that non-elementary finitely presented subgroups of a hyperbolic group (in particular, non-elementary hyperbolic groups themselves) are quotient-saturated. Finally, we e… ▽ More We introduce the new notion of quotient-saturation as a measure of the immensity of the quotient structure of a group. We present a sufficient condition for a finitely presented group to be quotient-saturated, and use it to deduce that non-elementary finitely presented subgroups of a hyperbolic group (in particular, non-elementary hyperbolic groups themselves) are quotient-saturated. Finally, we elaborate on the previous results to extend the scope of this property to finitely presented acylindrically hyperbolic groups. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 8 pages

MSC Class: 20F05

arXiv:2403.17218 [pdf, other]

A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

Authors: Benjamin Steenhoek, Md Mahbubur Rahman, Monoshi Kumar Roy, Mirza Sanjida Alam, Earl T. Barr, Wei Le

Abstract: Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabiliti… ▽ More Large Language Models (LLMs) have demonstrated great potential for code generation and other software engineering tasks. Vulnerability detection is of crucial importance to maintaining the security, integrity, and trustworthiness of software systems. Precise vulnerability detection requires reasoning about the code, making it a good case study for exploring the limits of LLMs' reasoning capabilities. Although recent work has applied LLMs to vulnerability detection using generic prompting techniques, their full capabilities for this task and the types of errors they make when explaining identified vulnerabilities remain unclear. In this paper, we surveyed eleven LLMs that are state-of-the-art in code generation and commonly used as coding assistants, and evaluated their capabilities for vulnerability detection. We systematically searched for the best-performing prompts, incorporating techniques such as in-context learning and chain-of-thought, and proposed three of our own prompting methods. Our results show that while our prompting methods improved the models' performance, LLMs generally struggled with vulnerability detection. They reported 0.5-0.63 Balanced Accuracy and failed to distinguish between buggy and fixed versions of programs in 76% of cases on average. By comprehensively analyzing and categorizing 287 instances of model reasoning, we found that 57% of LLM responses contained errors, and the models frequently predicted incorrect locations of buggy code and misidentified bug types. LLMs only correctly localized 6 out of 27 bugs in DbgBench, and these 6 bugs were predicted correctly by 70-100% of human participants. These findings suggest that despite their potential for other tasks, LLMs may fail to properly comprehend critical code structures and security-related concepts. Our data and code are available at https://figshare.com/s/78fe02e56e09ec49300b. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.07019 [pdf]

Reasons behind the Water Crisis and its Potential Health Outcomes

Authors: Md. Galib Ishraq Emran, Rhidi Barma, Akram Hussain Khan, Mrinmoy Roy

Abstract: Globally, the water crisis has become a significant problem that affects develo** and industrialized nations. Water shortage can harm public health by increasing the chance of contracting water-borne diseases, dehydration, and malnutrition. This study aims to examine the causes of the water problem and its likely effects on human health. The study scrutinizes the reasons behind the water crisis,… ▽ More Globally, the water crisis has become a significant problem that affects develo** and industrialized nations. Water shortage can harm public health by increasing the chance of contracting water-borne diseases, dehydration, and malnutrition. This study aims to examine the causes of the water problem and its likely effects on human health. The study scrutinizes the reasons behind the water crisis, including population increase, climate change, and inefficient water management techniques. The results of a lack of water on human health, such as the spread of infectious diseases, a higher risk of starvation and dehydration, and psychological stress, are also concealed in the study. The research further suggests several ways to deal with the water situation and lessen its potential outcomes on human health. These remedies include enhanced sanitation and hygiene procedures, water management, and conservation techniques like rainwater gathering and wastewater recycling. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.01287 [pdf, ps, other]

Towards a classification of $p^2$-discriminant ideal twins over number fields

Authors: Alyson Deines, Asimina S. Hamakiotes, Andreea Iorga, Changningphaabi Namoijam, Manami Roy, Lori D. Watson

Abstract: Isogenous elliptic curves have the same conductor but not necessarily the same minimal discriminant ideal. In this article, we explicitly classify all $p^2$-isogenous elliptic curves defined over a number field with the same minimal discriminant ideal for odd prime $p$ where $X_0(p^2)$ has genus $0$, i.e., $p = 3$ or $5$. As a consequence, we give a list of all $p^2$-isogenous discriminant (ideal)… ▽ More Isogenous elliptic curves have the same conductor but not necessarily the same minimal discriminant ideal. In this article, we explicitly classify all $p^2$-isogenous elliptic curves defined over a number field with the same minimal discriminant ideal for odd prime $p$ where $X_0(p^2)$ has genus $0$, i.e., $p = 3$ or $5$. As a consequence, we give a list of all $p^2$-isogenous discriminant (ideal) twins over $\mathbb{Q}$ for such $p$. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 17 pages. arXiv admin note: text overlap with arXiv:2402.19183

MSC Class: 11G05; 14K02; 14H10; 14H52

arXiv:2402.19183 [pdf, ps, other]

Prime isogenous discriminant ideal twins

Authors: Alexander J. Barrios, Maila Brucal-Hallare, Alyson Deines, Piper Harris, Manami Roy

Abstract: Let $E_{1}$ and $E_{2}$ be elliptic curves defined over a number field $K$. We say that $E_{1}$ and $E_{2}$ are discriminant ideal twins if they are not $K$-isomorphic and have the same minimal discriminant ideal and conductor. Such curves are said to be discriminant twins if, for each prime $\mathfrak{p}$ of $K$, there are $\mathfrak{p}$-minimal models for $E_{1}$ and $E_{2}$ whose discriminants… ▽ More Let $E_{1}$ and $E_{2}$ be elliptic curves defined over a number field $K$. We say that $E_{1}$ and $E_{2}$ are discriminant ideal twins if they are not $K$-isomorphic and have the same minimal discriminant ideal and conductor. Such curves are said to be discriminant twins if, for each prime $\mathfrak{p}$ of $K$, there are $\mathfrak{p}$-minimal models for $E_{1}$ and $E_{2}$ whose discriminants are equal. This article explicitly classifies all prime-isogenous discriminant (ideal) twins over $\mathbb{Q}$. We obtain this classification as a consequence of our main results, which constructively gives all $p$-isogenous discriminant ideal twins over number fields where $p\in\left\{ 2,3,5,7,13\right\}$, i.e., where $X_0(p)$ has genus $0$. In particular, we find that up to twist, there are finitely many $p$-isogenous discriminant ideal twins if and only if $K$ is $\mathbb{Q}$ or an imaginary quadratic field. In the latter case, we provide instructions for finding the finitely many pairs of $j$-invariants that result in $p$-isogenous discriminant ideal twins. We prove our results by considering the local data of parameterized $p$-isogenous elliptic curves. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 35 pages

MSC Class: 11G05; 11G07; 14K02; 14H10; 14H52

arXiv:2402.09327 [pdf, other]

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Authors: Idan Attias, Gintare Karolina Dziugaite, Mahdi Haghifam, Roi Livni, Daniel M. Roy

Abstract: In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is… ▽ More In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the $L^2$ Lipschitz--bounded setting and under strong convexity, every learner with an excess error $\varepsilon$ has CMI bounded below by $Ω(1/\varepsilon^2)$ and $Ω(1/\varepsilon)$, respectively. We further demonstrate the essential role of memorization in learning problems in SCO by designing an adversary capable of accurately identifying a significant fraction of the training samples in specific SCO problems. Finally, we enumerate several implications of our results, such as a limitation of generalization bounds based on CMI and the incompressibility of samples in SCO problems. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 44 Pages

arXiv:2401.15277 [pdf, ps, other]

The quaternionic Maass Spezialschar on split $\mathrm{SO}(8)$

Authors: Jennifer Johnson-Leung, Finn McGlade, Isabella Negrini, Aaron Pollack, Manami Roy

Abstract: The classical Maass Spezialschar is a Hecke-stable subspace of the level one holomorphic Siegel modular forms of genus two, i.e., on $\mathrm{Sp}_4$, cut out by certain linear relations between the Fourier coefficients. It is a theorem of Andrianov, Maass, and Zagier, that the classical Maass Spezialschar is exactly equal to the space of Saito-Kurokawa lifts. We study an analogous space of quatern… ▽ More The classical Maass Spezialschar is a Hecke-stable subspace of the level one holomorphic Siegel modular forms of genus two, i.e., on $\mathrm{Sp}_4$, cut out by certain linear relations between the Fourier coefficients. It is a theorem of Andrianov, Maass, and Zagier, that the classical Maass Spezialschar is exactly equal to the space of Saito-Kurokawa lifts. We study an analogous space of quaternionic modular forms on split $\mathrm{SO}_8$, and prove the analogue of the Andrianov-Maass-Zagier theorem. Our main tool for proving this theorem is the development of a theory of a Fourier-Jacobi expansion of quaternionic modular forms on orthogonal groups. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 48 pages

MSC Class: 11F55; 11F30; 11F67

arXiv:2401.13474 [pdf, other]

doi 10.1103/PhysRevLett.132.266505

Intriguing Low-Temperature Phase in the Antiferromagnetic Kagome Metal FeGe

Authors: M. Wenzel, E. Uykur, A. A. Tsirlin, S. Pal, R. Mathew Roy, C. Yi, C. Shekhar, C. Felser, A. V. Pronin, M. Dressel

Abstract: The properties of kagome metals are governed by the interdependence of band topology and electronic correlations resulting in remarkably rich phase diagrams. Here, we study the temperature evolution of the bulk electronic structure of the antiferromagnetic kagome metal FeGe using infrared spectroscopy. We uncover drastic changes in the low-energy interband absorption at the 100 K structural phase… ▽ More The properties of kagome metals are governed by the interdependence of band topology and electronic correlations resulting in remarkably rich phase diagrams. Here, we study the temperature evolution of the bulk electronic structure of the antiferromagnetic kagome metal FeGe using infrared spectroscopy. We uncover drastic changes in the low-energy interband absorption at the 100 K structural phase transition that has been linked to a charge-density-wave (CDW) instability. We explain this effect by the minuscule Fe displacement in the kagome plane, which results in parallel bands in the vicinity of the Fermi level. In contrast to conventional CDW materials, however, the spectral weight shifts to low energies, ruling out the opening of a CDW gap in FeGe. △ Less

Submitted 12 July, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Journal ref: Phys. Rev. Lett. 132, 266505 (2024)

arXiv:2401.06535 [pdf, other]

Simulating open quantum systems using noise models and NISQ devices with error mitigation

Authors: Mainak Roy, Jessica John Britto, Ryan Hill, Victor Onofre

Abstract: In this work, we present simulations of two Open Quantum System models, Collisional and Markovian Reservoir, with noise simulations, the IBM devices ($\textit{ibm_kyoto}$, $\textit{ibm_osaka}$) and the OQC device Lucy. Extending the results of García-Pérez, et al. [npj Quantum Information 6.1 (2020): 1]. Using the Mitiq toolkit, we apply Zero-Noise extrapolation (ZNE), an error mitigation techniqu… ▽ More In this work, we present simulations of two Open Quantum System models, Collisional and Markovian Reservoir, with noise simulations, the IBM devices ($\textit{ibm_kyoto}$, $\textit{ibm_osaka}$) and the OQC device Lucy. Extending the results of García-Pérez, et al. [npj Quantum Information 6.1 (2020): 1]. Using the Mitiq toolkit, we apply Zero-Noise extrapolation (ZNE), an error mitigation technique, and analyze their deviation from the theoretical results for the models under study. For both models, by applying ZNE, we were able to reduce the error and overlap it with the theoretical results. All our simulations and experiments were done in the qBraid environment. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.00538 [pdf, other]

Electric Charging Effects on Insulating Surfaces in Cryogenic Liquids

Authors: Wolfgang Korsch, Mark Broering, Ashok Timsina, Kent K. H. Leung, Joshua Abney, Dmitry Budker, Bradley W. Filippone, Jiachen He, Suman Kandu, Mark McCrea, Murchhana Roy, Christopher Swank, Weijun Yao

Abstract: This paper presents a new technique to study the adsorption and desorption of ions and electrons on insulating surfaces in the presence of strong electric fields in cryoliquids. The experimental design consists of a compact cryostat coupled with a sensitive electro-optical Kerr device to monitor the stability of the electric fields. The behavior of nitrogen and helium ions on a poly(methyl methacr… ▽ More This paper presents a new technique to study the adsorption and desorption of ions and electrons on insulating surfaces in the presence of strong electric fields in cryoliquids. The experimental design consists of a compact cryostat coupled with a sensitive electro-optical Kerr device to monitor the stability of the electric fields. The behavior of nitrogen and helium ions on a poly(methyl methacrylate) (PMMA) surface was compared to a PMMA surface coated with a mixture of deuterated polystyrene and deuterated polybutadiene. Ion accumulation and removal on these surfaces were unambiguously observed. Within the precision of the data, both surfaces behave similarly for the physisorbed ions. The setup was also used to measure the (quasi-)static dielectric constant of PMMA at T = 70 K. The impact of the ion adsorption on the search for a neutron permanent electric dipole moment in a cryogenic environment, like the nEDM@SNS experiment, is discussed. △ Less

Submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.17127 [pdf, other]

doi 10.1145/3632903

Probabilistic programming interfaces for random graphs: Markov categories, graphons, and nominal sets

Authors: Nathanael L. Ackerman, Cameron E. Freer, Younesse Kaddar, Jacek Karwowski, Sean K. Moss, Daniel M. Roy, Sam Staton, Hongseok Yang

Abstract: We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way. We provide three constructions for showing that every graphon arises f… ▽ More We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way. We provide three constructions for showing that every graphon arises from an equational theory. The first is an abstract construction, using Markov categories and monoidal indeterminates. The second and third are more concrete. The second is in terms of traditional measure theoretic probability, which covers 'black-and-white' graphons. The third is in terms of probability monads on the nominal sets of Gabbay and Pitts. Specifically, we use a variation of nominal sets induced by the theory of graphs, which covers Erdős-Rényi graphons. In this way, we build new models of graph probabilistic programming from graphons. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted for POPL 2024

Journal ref: Proc. ACM Program. Lang. 8, POPL, Article 61 (2024), pp 1819-1849

arXiv:2312.16349 [pdf, other]

de Finetti's theorem and the existence of regular conditional distributions and strong laws on exchangeable algebras

Authors: Peter Potaptchik, Daniel M. Roy, David Schrittesser

Abstract: We show the following generalizations of the de Finetti--Hewitt--Savage theorem: Given an exchangeable sequence of random elements, the sequence is conditionally i.i.d. if and only if each random element admits a regular conditional distribution given the exchangeable $σ$-algebra (equivalently, the shift invariant or the tail algebra). We use this result, which holds without any regularity or tech… ▽ More We show the following generalizations of the de Finetti--Hewitt--Savage theorem: Given an exchangeable sequence of random elements, the sequence is conditionally i.i.d. if and only if each random element admits a regular conditional distribution given the exchangeable $σ$-algebra (equivalently, the shift invariant or the tail algebra). We use this result, which holds without any regularity or technical conditions, to demonstrate that any exchangeable sequence of random elements whose common distribution is Radon is conditional iid. △ Less

Submitted 26 December, 2023; originally announced December 2023.

MSC Class: 60G09; 60G05; 28C15

arXiv:2310.16689 [pdf, ps, other]

Critical-point anomalies in doped CeRhIn5

Authors: Renjith Mathew Roy, Sudip Pal, Run Yang, Seulki Roh, Soohyeon Shin, Tae Beom Park, Tuson Park, Martin Dressel

Abstract: The heavy-fermion compound CeRhIn$_5$ can be tuned through a quantum critical point, when In is partially replaced by Sn. This way additional charge carriers are introduced and the antiferromagnetic order is gradually suppressed to zero temperature. Here we investigate the temperature-dependent optical properties of CeRh(In$_{1-x}$Sn$_x$)$_5$ single crystals for $x = 4.4\%$, $6.9\%$ and $9.8\%$. W… ▽ More The heavy-fermion compound CeRhIn$_5$ can be tuned through a quantum critical point, when In is partially replaced by Sn. This way additional charge carriers are introduced and the antiferromagnetic order is gradually suppressed to zero temperature. Here we investigate the temperature-dependent optical properties of CeRh(In$_{1-x}$Sn$_x$)$_5$ single crystals for $x = 4.4\%$, $6.9\%$ and $9.8\%$. With increasing Sn concentration the infrared conductivity reveals a clear enhancement of the $c$-$f$ hybridization strength. At low temperatures we observed a non-Fermi-liquid behavior in the frequency dependence of the scattering rate and effective mass in all three compounds. In addition, below a characteristic temperature $T^* \approx 10$ K, the temperature dependent resistivity $ρ(T)$ follows a $\log T$ behavior, typical for a non-Fermi liquid. The temperature-dependent magnetization also exhibits anomalous behavior below $T^*$. Our investigation reveal that below $T^*$ the system shows a pronounced non-Fermi-liquid behavior and $T^*$ monotonically increases as the quantum critical point is approached. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.04404 [pdf, other]

Seeding the CGM: How Satellites Populate the Cold Phase of Milky Way Halos

Authors: Manami Roy, Kung-Yi Su, Stephanie Tonnesen, Drummond B. Fielding, Claude-André Faucher-Giguère

Abstract: The origin of the cold phase in the CGM is a highly debated question. We investigate the contribution of satellite galaxies to the cold gas budget in the circumgalactic medium (CGM)of a Milky Way-like host galaxy. We perform controlled experiments with three different satellite mass distributions and identify several mechanisms by which satellites can add cold gas to the CGM, including ram pressur… ▽ More The origin of the cold phase in the CGM is a highly debated question. We investigate the contribution of satellite galaxies to the cold gas budget in the circumgalactic medium (CGM)of a Milky Way-like host galaxy. We perform controlled experiments with three different satellite mass distributions and identify several mechanisms by which satellites can add cold gas to the CGM, including ram pressure strip** and induced cooling in the mixing layer of the stripped cold gas. These two mechanisms contribute a comparable amount of cold gas to the host CGM. We find that the less massive satellites ($\leq 10^9 M_\odot$) not only lose all of their cold gas in a short period ($\sim$ 0.5-1 Gyr), but their stripped cold clouds also mix with the hot CGM gas and get heated up quickly. However, stellar feedback from these less massive satellites can hugely alter the fate of their stripped gas. Feedback speeds up the destruction of the stripped cold clouds from these satellites by making them more diffuse with more surface area. On the other hand, the more massive satellites (LMC or SMC-like $\sim 10^{10} M_\odot$) can add cold gas to the total gas budget of the host CGM for several Gyrs. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 17 pages, 12 figures, 2 tables Accepted for publication in MNRAS on Oct 6, 2023

arXiv:2310.03717 [pdf, other]

Beyond radial profiles: Using log-normal distributions to model the multiphase circumgalactic medium

Authors: Alankar Dutta, Mukesh Singh Bisht, Prateek Sharma, Ritali Ghosh, Manami Roy, Biman B. Nath

Abstract: Recent observations and simulations reveal that the circumgalactic medium (CGM) surrounding galaxies is multiphase, with the gas temperatures spanning a wide range at most radii, $\sim 10^4\ {\rm K}$ to the virial temperature ($\sim 10^6$ K for Milky Way). Traditional CGM models using simple density profiles are inadequate at reproducing observations that indicate a broad temperature range. Altern… ▽ More Recent observations and simulations reveal that the circumgalactic medium (CGM) surrounding galaxies is multiphase, with the gas temperatures spanning a wide range at most radii, $\sim 10^4\ {\rm K}$ to the virial temperature ($\sim 10^6$ K for Milky Way). Traditional CGM models using simple density profiles are inadequate at reproducing observations that indicate a broad temperature range. Alternatively, a model based on probability distribution functions (PDFs) with parameters motivated by simulations can better match multi-wavelength observations. In this work, we use log-normal distributions, commonly seen in the simulations of the multiphase interstellar and circumgalactic media, to model the multiphase CGM. We generalize the isothermal background model by Faerman et al. 2017 to include more general CGM profiles. We extend the existing probabilistic models from 1D-PDFs in temperature to 2D-PDFs in density-temperature phase space and constrain its parameters using a Milky Way-like {\tt Illustris TNG50-1} halo. We generate various synthetic observables such as column densities of different ions, UV/X-ray spectra, and dispersion and emission measures. X-ray and radio (Fast Radio Burst) observations mainly constrain the hot gas properties. However, interpreting cold/warm phase diagnostics is not straightforward since these phases are patchy, with inherent variability in intercepting these clouds along arbitrary lines of sight. We provide a tabulated comparison of model predictions with observations and plan to expand this into a comprehensive compilation of models and data. Our modeling provides a simple analytic framework that is useful for describing important aspects of the multiphase CGM. △ Less

Submitted 9 April, 2024; v1 submitted 26 September, 2023; originally announced October 2023.

Comments: 23 pages, 15 figures, 4 tables; submitted to MNRAS

arXiv:2310.00028 [pdf, other]

Fundamental scaling limits and bandwidth sha** of frequency-modulated combs

Authors: Mithun Roy, Zhenyang Xiao, Sadhvikas Addamane, David Burghoff

Abstract: Frequency-modulated (FM) combs based on active cavities like quantum cascade lasers have recently emerged as promising light sources in many spectral regions. Unlike passive modelocking, which uses amplitude modulation to generate amplitude modulation, FM combs use phase modulation to generate phase modulation. They can therefore be regarded as a phase-domain version of passive modelocking. Howeve… ▽ More Frequency-modulated (FM) combs based on active cavities like quantum cascade lasers have recently emerged as promising light sources in many spectral regions. Unlike passive modelocking, which uses amplitude modulation to generate amplitude modulation, FM combs use phase modulation to generate phase modulation. They can therefore be regarded as a phase-domain version of passive modelocking. However, while the ultimate scaling laws of passive modelocking have long been known -- Haus showed in 1975 that pulses have a bandwidth proportional to effective gain bandwidth -- the limits of FM combs have been much less clear. Here, we show that FM combs are governed by the same fundamental limits, producing combs whose bandwidths are linear in the effective gain bandwidth. Not only do we show theoretically that the diffusive effect of gain curvature limits comb bandwidth, we also show experimentally how this limit can be increased. By adding carefully designed resonant-loss structures that are evanescently coupled to the cavity of a terahertz laser, we reduce the curvature and increase the effective gain bandwidth of the laser, demonstrating bandwidth enhancement. Our results give a new degree of freedom for the creation of active chip-scale combs and can be applied to a wide array of cavity geometries. △ Less

Submitted 18 June, 2024; v1 submitted 28 September, 2023; originally announced October 2023.

Comments: 24 pages, 5 figures

arXiv:2308.02796 [pdf]

OBESEYE: Interpretable Diet Recommender for Obesity Management using Machine Learning and Explainable AI

Authors: Mrinmoy Roy, Srabonti Das, Anica Tasnim Protity

Abstract: Obesity, the leading cause of many non-communicable diseases, occurs mainly for eating more than our body requirements and lack of proper activity. So, being healthy requires heathy diet plans, especially for patients with comorbidities. But it is difficult to figure out the exact quantity of each nutrient because nutrients requirement varies based on physical and disease conditions. In our study… ▽ More Obesity, the leading cause of many non-communicable diseases, occurs mainly for eating more than our body requirements and lack of proper activity. So, being healthy requires heathy diet plans, especially for patients with comorbidities. But it is difficult to figure out the exact quantity of each nutrient because nutrients requirement varies based on physical and disease conditions. In our study we proposed a novel machine learning based system to predict the amount of nutrients one individual requires for being healthy. We applied different machine learning algorithms: linear regression, support vector machine (SVM), decision tree, random forest, XGBoost, LightGBM on fluid and 3 other major micronutrients: carbohydrate, protein, fat consumption prediction. We achieved high accuracy with low root mean square error (RMSE) by using linear regression in fluid prediction, random forest in carbohydrate prediction and LightGBM in protein and fat prediction. We believe our diet recommender system, OBESEYE, is the only of its kind which recommends diet with the consideration of comorbidities and physical conditions and promote encouragement to get rid of obesity. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Report number: Roy, M.(2023).OBESEYE: Interpretable Diet Recommender for Obesity Management using Machine Learning and Explainable AI.IJRAMT, 4(6), 1-7. https://journal.ijramt.com/ijramt/article/view/2733

arXiv:2308.02662 [pdf, ps, other]

Linear isomorphism testing of Boolean functions with small approximate spectral norm

Authors: Arijit Ghosh, Chandrima Kayal, Manaswi Paraashar, Manmatha Roy

Abstract: Two Boolean functions f, g : F_2^{n} \to {-1, 1} are called linearly isomorphic if there exists an invertible matrix M \in F_2^{n\times n} such that f\circ M = g. Testing linear isomorphism is a generalization of, now classical in the context of property testing, isomorphism testing between Boolean functions. Linear-invariance of Boolean functions has also been extensively studied in other areas l… ▽ More Two Boolean functions f, g : F_2^{n} \to {-1, 1} are called linearly isomorphic if there exists an invertible matrix M \in F_2^{n\times n} such that f\circ M = g. Testing linear isomorphism is a generalization of, now classical in the context of property testing, isomorphism testing between Boolean functions. Linear-invariance of Boolean functions has also been extensively studied in other areas like coding theory and cryptography, and mathematics in general. In this paper, we will study the following two variants of this problem: [1] [Communication complexity: ] Assume that Boolean functions f and g on F_2^{n} are given to Alice and Bob respectively, and the goal is to test linear isomorphism between f and g by exchanging a minimum amount of communication, measured in bits, between Alice and Bob. Our main result is an efficient two-party communication protocol for testing linear isomorphism in terms of the approximate spectral norm of the functions. We will crucially exploit the connection between approximate spectral norm and sign-approximating polynomials. [2] [Query complexity: ] If f: F_2^{n} \to { -1, 1 } is a known function and g : F_2^{n} \to { -1, 1 } be an unknown function with a query access. We study the query complexity of testing linear isomorphism between f and g in terms of the approximate spectral norm of f. As in the case of communication complexity, we will use properties of the approximate spectral norm. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.14443 [pdf, ps, other]

Computation of endo-fixed closures in free-abelian times free groups

Authors: Mallika Roy, Enric Ventura

Abstract: In this paper, we explore the behaviour of the fixed subgroups of endomorphisms of free-abelian times free (FATF) groups. We exhibit an algorithm which, given a finitely generated subgroup $\mathcal{H}$ of a FATF group $\mathcal{G}$, decides whether $\mathcal{H}$ is the fixed subgroup of some (finite) family of endomorphisms of $\mathcal{G}$ and, in the affirmative case, it finds such a family. Th… ▽ More In this paper, we explore the behaviour of the fixed subgroups of endomorphisms of free-abelian times free (FATF) groups. We exhibit an algorithm which, given a finitely generated subgroup $\mathcal{H}$ of a FATF group $\mathcal{G}$, decides whether $\mathcal{H}$ is the fixed subgroup of some (finite) family of endomorphisms of $\mathcal{G}$ and, in the affirmative case, it finds such a family. The algorithm combines both combinatorial and algebraic methods. △ Less

Submitted 26 July, 2023; originally announced July 2023.

MSC Class: 20E05; 20E36; 20K15

arXiv:2307.14067 [pdf]

Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions

Authors: Mrinmoy Roy, Sarwar J. Minar, Porarthi Dhar, A T M Omor Faruq

Abstract: Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed informa… ▽ More Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Journal ref: BJMHR, 10(6), 24-54 (2023)

arXiv:2307.10060 [pdf, other]

Accurate deep learning sub-grid scale models for large eddy simulations

Authors: Rikhi Bose, Arunabha M. Roy

Abstract: We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms which, unlike state-of-the-art analytical modeling techniques can produce high-order complex non-linear relations between inputs and outputs. Explicit filtering of data… ▽ More We present two families of sub-grid scale (SGS) turbulence models developed for large-eddy simulation (LES) purposes. Their development required the formulation of physics-informed robust and efficient Deep Learning (DL) algorithms which, unlike state-of-the-art analytical modeling techniques can produce high-order complex non-linear relations between inputs and outputs. Explicit filtering of data from direct simulations of the canonical channel flow at two friction Reynolds numbers $Re_τ\approx 395$ and 590 provided accurate data for training and testing. The two sets of models use different network architectures. One of the architectures uses tensor basis neural networks (TBNN) and embeds the simplified analytical model form of the general effective-viscosity hypothesis, thus incorporating the Galilean, rotational and reflectional invariances. The other architecture is that of a relatively simple network, that is able to incorporate the Galilean invariance only. However, this simpler architecture has better feature extraction capacity owing to its ability to establish relations between and extract information from cross-components of the integrity basis tensors and the SGS stresses. Both sets of models are used to predict the SGS stresses for feature datasets generated with different filter widths, and at different Reynolds numbers. It is shown that due to the simpler model's better feature learning capabilities, it outperforms the invariance embedded model in statistical performance metrics. In a priori tests, both sets of models provide similar levels of dissipation and backscatter. Based on the test results, both sets of models should be usable in a posteriori actual LESs. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.17759 [pdf, other]

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Abstract: In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a… ▽ More In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications. △ Less

Submitted 9 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

arXiv:2305.08638 [pdf, ps, other]

Algebraic Winding Numbers

Authors: Daniel Perrucci, Marie-Françoise Roy

Abstract: In this paper we study in detail the properties of the algebraic winding number proposed in a paper by M. Eisermann with respect to complex root counting in rectangles. We also propose a new algebraic winding number which computes the number of complex roots of a polynomial in a rectangle under no assumptions, including roots on edges or vertices with appropriate counting. We extend both winding n… ▽ More In this paper we study in detail the properties of the algebraic winding number proposed in a paper by M. Eisermann with respect to complex root counting in rectangles. We also propose a new algebraic winding number which computes the number of complex roots of a polynomial in a rectangle under no assumptions, including roots on edges or vertices with appropriate counting. We extend both winding numbers to rational functions, obtaining then an algebraic version of the argument principle for rectangles. △ Less

Submitted 15 May, 2023; originally announced May 2023.

MSC Class: 12D10; 13J30; 14Q20

arXiv:2305.05687 [pdf, other]

doi 10.3847/1538-4357/accc89

Coronal Heating as Determined by the Solar Flare Frequency Distribution Obtained by Aggregating Case Studies

Authors: James Paul Mason, Alexandra Werth, Colin G. West, Allison A. Youngblood, Donald L. Woodraska, Courtney Peck, Kevin Lacjak, Florian G. Frick, Moutamen Gabir, Reema A. Alsinan, Thomas Jacobsen, Mohammad Alrubaie, Kayla M. Chizmar, Benjamin P. Lau, Lizbeth Montoya Dominguez, David Price, Dylan R. Butler, Connor J. Biron, Nikita Feoktistov, Kai Dewey, N. E. Loomis, Michal Bodzianowski, Connor Kuybus, Henry Dietrick, Aubrey M. Wolfe , et al. (977 additional authors not shown)

Abstract: Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms th… ▽ More Flare frequency distributions represent a key approach to addressing one of the largest problems in solar and stellar physics: determining the mechanism that counter-intuitively heats coronae to temperatures that are orders of magnitude hotter than the corresponding photospheres. It is widely accepted that the magnetic field is responsible for the heating, but there are two competing mechanisms that could explain it: nanoflares or Alfvén waves. To date, neither can be directly observed. Nanoflares are, by definition, extremely small, but their aggregate energy release could represent a substantial heating mechanism, presuming they are sufficiently abundant. One way to test this presumption is via the flare frequency distribution, which describes how often flares of various energies occur. If the slope of the power law fitting the flare frequency distribution is above a critical threshold, $α=2$ as established in prior literature, then there should be a sufficient abundance of nanoflares to explain coronal heating. We performed $>$600 case studies of solar flares, made possible by an unprecedented number of data analysts via three semesters of an undergraduate physics laboratory course. This allowed us to include two crucial, but nontrivial, analysis methods: pre-flare baseline subtraction and computation of the flare energy, which requires determining flare start and stop times. We aggregated the results of these analyses into a statistical study to determine that $α= 1.63 \pm 0.03$. This is below the critical threshold, suggesting that Alfvén waves are an important driver of coronal heating. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 1,002 authors, 14 pages, 4 figures, 3 tables, published by The Astrophysical Journal on 2023-05-09, volume 948, page 71

arXiv:2303.14697 [pdf, ps, other]

The central tree property and algorithmic problems on subgroups of free groups

Authors: Mallika Roy, Enric Ventura, Pascal Weil

Abstract: We study the average case complexity of the uniform membership problem for subgroups of free groups, and we show that it is orders of magnitude smaller than the worst case complexity of the best known algorithms. This applies to subgroups given by a fixed number of generators as well as to subgroups given by an exponential number of generators. The main idea behind this result is to exploit a gene… ▽ More We study the average case complexity of the uniform membership problem for subgroups of free groups, and we show that it is orders of magnitude smaller than the worst case complexity of the best known algorithms. This applies to subgroups given by a fixed number of generators as well as to subgroups given by an exponential number of generators. The main idea behind this result is to exploit a generic property of tuples of words, called the central tree property. An application is given to the average case complexity of the relative primitivity problem, using Shpilrain's recent algorithm to decide primitivity, whose average case complexity is a constant depending only on the rank of the ambient free group. △ Less

Submitted 19 October, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

Comments: 28 pages. Inaccuracies corrected. To appear in Journal of Group Theory

MSC Class: 20E05; 20F10; 68Q17

arXiv:2303.04808 [pdf]

Prevalence and Major Risk Factors of Non-communicable Diseases: A Machine Learning based Cross-Sectional Study

Authors: Mrinmoy Roy, Anica Tasnim Protity, Srabonti Das, Porarthi Dhar

Abstract: Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was pre… ▽ More Objective: The study aimed to determine the prevalence of several non-communicable diseases (NCD) and analyze risk factors among adult patients seeking nutritional guidance in Dhaka, Bangladesh. Result: Our study observed the relationships between gender, age groups, obesity, and NCDs (DM, CKD, IBS, CVD, CRD, thyroid). The most frequently reported NCD was cardiovascular issues (CVD), which was present in 83.56% of all participants. CVD was more common in male participants. Consequently, male participants had a higher blood pressure distribution than females. Diabetes mellitus (DM), on the other hand, did not have a gender-based inclination. Both CVD and DM had an age-based progression. Our study showed that chronic respiratory illness was more frequent in middle-aged participants than in younger or elderly individuals. Based on the data, every one in five hospitalized patients was obese. We analyzed the co-morbidities and found that 31.5% of the population has only one NCD, 30.1% has two NCDs, and 38.3% has more than two NCDs. Besides, 86.25% of all diabetic patients had cardiovascular issues. All thyroid patients in our study had CVD. Using a t-test, we found a relationship between CKD and thyroid (p-value 0.061). Males under 35 years have a statistically significant relationship between thyroid and chronic respiratory diseases (p-value 0.018). We also found an association between DM and CKD among patients over 65 (p-value 0.038). Moreover, there has been a statistically significant relationship between CKD and Thyroid (P < 0.05) for those below 35 and 35-65. We used a two-way ANOVA test to find the statistically significant interaction of heart issues and chronic respiratory illness, in combination with diabetes. The combination of DM and RTI also affected CKD in male patients over 65 years old. △ Less

Submitted 18 May, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 25 pages, 10 figures, 3 tables

arXiv:2303.04275 [pdf, other]

A Computer Vision Enabled damage detection model with improved YOLOv5 based on Transformer Prediction Head

Authors: Arunabha M. Roy, Jayabrata Bhaduri

Abstract: Objective:Computer vision-based up-to-date accurate damage classification and localization are of decisive importance for infrastructure monitoring, safety, and the serviceability of civil infrastructure. Current state-of-the-art deep learning (DL)-based damage detection models, however, often lack superior feature extraction capability in complex and noisy environments, limiting the development o… ▽ More Objective:Computer vision-based up-to-date accurate damage classification and localization are of decisive importance for infrastructure monitoring, safety, and the serviceability of civil infrastructure. Current state-of-the-art deep learning (DL)-based damage detection models, however, often lack superior feature extraction capability in complex and noisy environments, limiting the development of accurate and reliable object distinction. Method: To this end, we present DenseSPH-YOLOv5, a real-time DL-based high-performance damage detection model where DenseNet blocks have been integrated with the backbone to improve in preserving and reusing critical feature information. Additionally, convolutional block attention modules (CBAM) have been implemented to improve attention performance mechanisms for strong and discriminating deep spatial feature extraction that results in superior detection under various challenging environments. Moreover, additional feature fusion layers and a Swin-Transformer Prediction Head (SPH) have been added leveraging advanced self-attention mechanism for more efficient detection of multiscale object sizes and simultaneously reducing the computational complexity. Results: Evaluating the model performance in large-scale Road Damage Dataset (RDD-2018), at a detection rate of 62.4 FPS, DenseSPH-YOLOv5 obtains a mean average precision (mAP) value of 85.25 %, F1-score of 81.18 %, and precision (P) value of 89.51 % outperforming current state-of-the-art models. Significance: The present research provides an effective and efficient damage localization model addressing the shortcoming of existing DL-based damage detection models by providing highly accurate localized bounding box prediction. Current work constitutes a step towards an accurate and robust automated damage detection system in real-time in-field applications. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.09668 [pdf, other]

Physics-aware deep learning framework for linear elasticity

Authors: Arunabha M. Roy, Rikhi Bose

Abstract: The paper presents an efficient and robust data-driven deep learning (DL) computational framework developed for linear continuum elasticity problems. The methodology is based on the fundamentals of the Physics Informed Neural Networks (PINNs). For an accurate representation of the field variables, a multi-objective loss function is proposed. It consists of terms corresponding to the residual of th… ▽ More The paper presents an efficient and robust data-driven deep learning (DL) computational framework developed for linear continuum elasticity problems. The methodology is based on the fundamentals of the Physics Informed Neural Networks (PINNs). For an accurate representation of the field variables, a multi-objective loss function is proposed. It consists of terms corresponding to the residual of the governing partial differential equations (PDE), constitutive relations derived from the governing physics, various boundary conditions, and data-driven physical knowledge fitting terms across randomly selected collocation points in the problem domain. To this end, multiple densely connected independent artificial neural networks (ANNs), each approximating a field variable, are trained to obtain accurate solutions. Several benchmark problems including the Airy solution to elasticity and the Kirchhoff-Love plate problem are solved. Performance in terms of accuracy and robustness illustrates the superiority of the current framework showing excellent agreement with analytical solutions. The present work combines the benefits of the classical methods depending on the physical information available in analytical relations with the superior capabilities of the DL techniques in the data-driven construction of lightweight, yet accurate and robust neural networks. The models developed herein can significantly boost computational speed using minimal network parameters with easy adaptability in different computational platforms. △ Less

Submitted 19 February, 2023; originally announced February 2023.

arXiv:2301.00948 [pdf, other]

Understanding EEG signals for subject-wise Definition of Armoni Activities

Authors: Kislay Raj, Aditya Singh, Abhishek Mandal, Teerath Kumar, Arunabha M. Roy

Abstract: In a growing world of technology, psychological disorders became a challenge to be solved. The methods used for cognitive stimulation are very conventional and based on one-way communication, which only relies on the material or method used for training of an individual. It doesn't use any kind of feedback from the individual to analyze the progress of the training process. We have proposed a clos… ▽ More In a growing world of technology, psychological disorders became a challenge to be solved. The methods used for cognitive stimulation are very conventional and based on one-way communication, which only relies on the material or method used for training of an individual. It doesn't use any kind of feedback from the individual to analyze the progress of the training process. We have proposed a closed-loop methodology to improve the cognitive state of a person with ID (Intellectual disability). We have used a platform named 'Armoni', for providing training to the intellectually disabled individuals. The learning is performed in a closed-loop by using feedback in the form of change in affective state. For feedback to the Armoni, an EEG (Electroencephalograph) headband is used. All the changes in EEG are observed and classified against the change in the mean and standard deviation value of all frequency bands of signal. This comparison is being helpful in defining every activity with respect to change in brain signals. In this paper, we have discussed the process of treatment of EEG signal and its definition against the different activities of Armoni. We have tested it on 6 different systems with different age groups and cognitive levels. △ Less

Submitted 26 April, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

Comments: Submitted to SN Computer Science journal

arXiv:2301.00122 [pdf]

doi 10.24018/ejcompute.2023.3.1.85

Hair and Scalp Disease Detection using Machine Learning and Image Processing

Authors: Mrinmoy Roy, Anica Tasnim Protity

Abstract: Almost 80 million Americans suffer from hair loss due to aging, stress, medication, or genetic makeup. Hair and scalp-related diseases often go unnoticed in the beginning. Sometimes, a patient cannot differentiate between hair loss and regular hair fall. Diagnosing hair-related diseases is time-consuming as it requires professional dermatologists to perform visual and medical tests. Because of tha… ▽ More Almost 80 million Americans suffer from hair loss due to aging, stress, medication, or genetic makeup. Hair and scalp-related diseases often go unnoticed in the beginning. Sometimes, a patient cannot differentiate between hair loss and regular hair fall. Diagnosing hair-related diseases is time-consuming as it requires professional dermatologists to perform visual and medical tests. Because of that, the overall diagnosis gets delayed, which worsens the severity of the illness. Due to the image-processing ability, neural network-based applications are used in various sectors, especially healthcare and health informatics, to predict deadly diseases like cancers and tumors. These applications assist clinicians and patients and provide an initial insight into early-stage symptoms. In this study, we used a deep learning approach that successfully predicts three main types of hair loss and scalp-related diseases: alopecia, psoriasis, and folliculitis. However, limited study in this area, unavailability of a proper dataset, and degree of variety among the images scattered over the internet made the task challenging. 150 images were obtained from various sources and then preprocessed by denoising, image equalization, enhancement, and data balancing, thereby minimizing the error rate. After feeding the processed data into the 2D convolutional neural network (CNN) model, we obtained overall training accuracy of 96.2%, with a validation accuracy of 91.1%. The precision and recall score of alopecia, psoriasis, and folliculitis are 0.895, 0.846, and 1.0, respectively. We also created a dataset of the scalp images for future prospective researchers. △ Less

Submitted 30 May, 2023; v1 submitted 30 December, 2022; originally announced January 2023.

Journal ref: EJ-Compute.2023;3(1):7-13

arXiv:2301.00119 [pdf, other]

Bell Inequalities and Maximally Realistic Causal Quantum Mechanics

Authors: S. M. Roy

Abstract: The De Broglie-Bohm (DeBB)\cite{DeBB} Causal Quantum Mechanics played a crucial role in Bell's discovery \cite{Bell1964} that quantum mechanics violates EPR local reality \cite{EPR1935}, and also in Bell's search for an exact quantum mechanics. The experiments of Aspect et al \cite{Aspect1981} confirm quantum correlations between plane polarizations of two photons and violation of Bell's inequalit… ▽ More The De Broglie-Bohm (DeBB)\cite{DeBB} Causal Quantum Mechanics played a crucial role in Bell's discovery \cite{Bell1964} that quantum mechanics violates EPR local reality \cite{EPR1935}, and also in Bell's search for an exact quantum mechanics. The experiments of Aspect et al \cite{Aspect1981} confirm quantum correlations between plane polarizations of two photons and violation of Bell's inequalities by a factor $\sqrt 2 $. I prove that similar experiments with elliptic polarizers can also show quantum violations of Bell's inequality by the same factor. I summarize our construction of a maximally realistic causal quantum mechanics in $n-$dimensional configuration space \cite{Roy-Singh1995}. Phase space Bell inequalities and 'Marginal Theorems' \cite{Auberson2002} play a crucial role. △ Less

Submitted 31 January, 2023; v1 submitted 30 December, 2022; originally announced January 2023.

Comments: 9 pages ,5 Figures

arXiv:2212.13556 [pdf, other]

Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization

Authors: Mahdi Haghifam, Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite

Abstract: To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds… ▽ More To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques. △ Less

Submitted 13 July, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: 49 pages, 2 figures. This version corrects a mistake in the proof of Theorem 17. Proc. International Conference on Algorithmic Learning Theory (ALT), 2023

arXiv:2212.13250 [pdf, ps, other]

Statistical minimax theorems via nonstandard analysis

Authors: Haosui Duanmu, Daniel M. Roy, David Schrittesser

Abstract: For statistical decision problems with finite parameter space, it is well-known that the upper value (minimax value) agrees with the lower value (maximin value). Only under a generalized notion of prior does such an equivalence carry over to the case infinite parameter spaces, provided nature can play a prior distribution and the statistician can play a randomized strategy. Various such extensions… ▽ More For statistical decision problems with finite parameter space, it is well-known that the upper value (minimax value) agrees with the lower value (maximin value). Only under a generalized notion of prior does such an equivalence carry over to the case infinite parameter spaces, provided nature can play a prior distribution and the statistician can play a randomized strategy. Various such extensions of this classical result have been established, but they are subject to technical conditions such as compactness of the parameter space or continuity of the risk functions. Using nonstandard analysis, we prove a minimax theorem for arbitrary statistical decision problems. Informally, we show that for every statistical decision problem, the standard upper value equals the lower value when the $\sup$ is taken over the collection of all internal priors, which may assign infinitesimal probability to (internal) events. Applying our nonstandard minimax theorem, we derive several standard minimax theorems: a minimax theorem on compact parameter space with continuous risk functions, a finitely additive minimax theorem with bounded risk functions and a minimax theorem on totally bounded metric parameter spaces with Lipschitz risk functions. △ Less

Submitted 26 December, 2022; originally announced December 2022.

MSC Class: 62C20; 62A01

arXiv:2210.15894 [pdf, other]

Sublacunary sequences that are strong swee** out

Authors: Sovanlal Mondal, Madhumita Roy, Máté Wierdl

Abstract: An increasing sequence $(a_n)$ of positive integers which satisfies $\frac{a_{n+1}}{a_n}>1+η$ for some positive $η$ is called a lacunary sequence. It has been known for over twenty years that every lacunary sequence is strong swee** out which means that in every aperiodic dynamical system we can find a set $E$ of arbitrary small measure so that… ▽ More An increasing sequence $(a_n)$ of positive integers which satisfies $\frac{a_{n+1}}{a_n}>1+η$ for some positive $η$ is called a lacunary sequence. It has been known for over twenty years that every lacunary sequence is strong swee** out which means that in every aperiodic dynamical system we can find a set $E$ of arbitrary small measure so that $\limsup_N\frac{1}{N} \sum_{n\le N}\mathbb{1}_E(T^nx)=1$ and $\liminf_N\frac{1}{N} \sum_{n\le N}\mathbb{1}_E(T^nx)=0$ almost everywhere. In this paper we improve this result by showing that if $(a_n)$ satisfies only $\frac{a_{n+1}}{a_n}>1+\frac1{(\log\log n)^{1-η}}$ for some positive $η$ then it is already strong swee** out. △ Less

Submitted 20 March, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: 15 pages, 4 figures

MSC Class: 37A30 (Primary); 37A44; 37A46 (Secondary)

arXiv:2210.13738 [pdf, other]

Pruning's Effect on Generalization Through the Lens of Training and Regularization

Authors: Tian **, Michael Carbin, Daniel M. Roy, Jonathan Frankle, Gintare Karolina Dziugaite

Abstract: Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads… ▽ More Practitioners frequently observe that pruning improves model generalization. A long-standing hypothesis based on bias-variance trade-off attributes this generalization improvement to model size reduction. However, recent studies on over-parameterization characterize a new model size regime, in which larger models achieve better generalization. Pruning models in this over-parameterized regime leads to a contradiction -- while theory predicts that reducing model size harms generalization, pruning to a range of sparsities nonetheless improves it. Motivated by this contradiction, we re-examine pruning's effect on generalization empirically. We show that size reduction cannot fully account for the generalization-improving effect of standard pruning algorithms. Instead, we find that pruning leads to better training at specific sparsities, improving the training loss over the dense model. We find that pruning also leads to additional regularization at other sparsities, reducing the accuracy degradation due to noisy examples over the dense model. Pruning extends model training time and reduces model size. These two factors improve training and add regularization respectively. We empirically demonstrate that both factors are essential to fully explaining pruning's impact on generalization. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 49 pages, 20 figures

Journal ref: Advances in Neural Information Processing Systems 2022

arXiv:2210.04252 [pdf, other]

Precise Single-stage Detector

Authors: Aisha Chandio, Gong Gui, Teerath Kumar, Irfan Ullah, Ramin Ranjbarzadeh, Arunabha M Roy, Akhtar Hussain, Yao Shen

Abstract: There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confide… ▽ More There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: We will submit it soon to the IEEE transaction. Due to characters limitation, we can not upload the full abstract. Please read the pdf file for more detail

arXiv:2209.11234 [pdf, other]

doi 10.1002/adem.202300104

Artificial Intelligence in Material Engineering: A review on applications of AI in Material Engineering

Authors: Lipichanda Goswami, Manoj Deka, Mohendra Roy

Abstract: The role of artificial intelligence (AI) in material science and engineering (MSE) is becoming increasingly important as AI technology advances. The development of high-performance computing has made it possible to test deep learning (DL) models with significant parameters, providing an opportunity to overcome the limitation of traditional computational methods, such as density functional theory (… ▽ More The role of artificial intelligence (AI) in material science and engineering (MSE) is becoming increasingly important as AI technology advances. The development of high-performance computing has made it possible to test deep learning (DL) models with significant parameters, providing an opportunity to overcome the limitation of traditional computational methods, such as density functional theory (DFT), in property prediction. Machine learning (ML)-based methods are faster and more accurate than DFT-based methods. Furthermore, the generative adversarial networks (GANs) have facilitated the generation of chemical compositions of inorganic materials without using crystal structure information. These developments have significantly impacted material engineering (ME) and research. Some of the latest developments in AI in ME herein are reviewed. First, the development of AI in the critical areas of ME, such as in material processing, the study of structure and material property, and measuring the performance of materials in various aspects, is discussed. Then, the significant methods of AI and their uses in MSE, such as graph neural network, generative models, transfer of learning, etc. are discussed. The use of AI to analyze the results from existing analytical instruments is also discussed. Finally, AI's advantages, disadvantages, and future in ME are discussed. △ Less

Submitted 27 April, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: V3

arXiv:2209.06977 [pdf]

SQL and NoSQL Databases Software architectures performance analysis and assessments -- A Systematic Literature review

Authors: Wisal Khan, Teerath Kumar, Zhang Cheng, Kislay Raj, Arunabha M Roy, Bin Luo

Abstract: Context: The efficient processing of Big Data is a challenging task for SQL and NoSQL Databases, where competent software architecture plays a vital role. The SQL Databases are designed for structuring data and supporting vertical scalability. In contrast, horizontal scalability is backed by NoSQL Databases and can process sizeable unstructured Data efficiently. One can choose the right paradigm a… ▽ More Context: The efficient processing of Big Data is a challenging task for SQL and NoSQL Databases, where competent software architecture plays a vital role. The SQL Databases are designed for structuring data and supporting vertical scalability. In contrast, horizontal scalability is backed by NoSQL Databases and can process sizeable unstructured Data efficiently. One can choose the right paradigm according to the organisation's needs; however, making the correct choice can often be challenging. The SQL and NoSQL Databases follow different architectures. Also, the mixed model is followed by each category of NoSQL Databases. Hence, data movement becomes difficult for cloud consumers across multiple cloud service providers (CSPs). In addition, each cloud platform IaaS, PaaS, SaaS, and DBaaS also monitors various paradigms. Objective: This systematic literature review (SLR) aims to study the related articles associated with SQL and NoSQL Database software architectures and tackle data portability and Interoperability among various cloud platforms. State of the art presented many performance comparison studies of SQL and NoSQL Databases by observing scaling, performance, availability, consistency and sharding characteristics. According to the research studies, NoSQL Database designed structures can be the right choice for big data analytics, while SQL Databases are suitable for OLTP Databases. The researcher proposes numerous approaches associated with data movement in the cloud. Platform-based APIs are developed, which makes users' data movement difficult. Therefore, data portability and Interoperability issues are noticed during data movement across multiple CSPs. To minimize developer efforts and Interoperability, Unified APIs are demanded to make data movement relatively more accessible among various cloud platforms. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: 57 pages systematic literature review, already submitted to Big Data Research; More importantly, we can not add method, result and conclusion section in the abstract here due to characters limitations. Please check pdf file

arXiv:2208.14805 [pdf, other]

doi 10.1063/5.0119963

Model-free prediction of multistability using echo state network

Authors: Mousumi Roy, Swarnendu Mandal, Chittaranjan Hens, Awadhesh Prasad, N. V. Kuznetsov, Manish Dev Shrimali

Abstract: In the field of complex dynamics, multistable attractors have been gaining a significant attention due to its unpredictability in occurrence and extreme sensitivity to initial conditions. Co-existing attractors are abundant in diverse systems ranging from climate to finance, ecological to social systems. In this article, we investigate a data-driven approach to infer different dynamics of a multis… ▽ More In the field of complex dynamics, multistable attractors have been gaining a significant attention due to its unpredictability in occurrence and extreme sensitivity to initial conditions. Co-existing attractors are abundant in diverse systems ranging from climate to finance, ecological to social systems. In this article, we investigate a data-driven approach to infer different dynamics of a multistable system using echo state network (ESN). We start with a parameter-aware reservoir and predict diverse dynamics for different parameter values. Interestingly, machine is able to reproduce the dynamics almost perfectly even at distant parameters which lie considerably far from the parameter values related to the training dynamics. In continuation, we can predict whole bifurcation diagram significant accuracy as well. We extend this study for exploring various dynamics of multistable attractors at unknown parameter value. While, we train the machine with the dynamics of only one attarctor at parameter $p$, it can capture the dynamics of co-existing attractor at a new parameter value $p+Δp$. Continuing the simulation for multiple set of initial conditions, we can identify the basins for different attractors. We generalize the results by applying the scheme on two distinct multistable systems. △ Less

Submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.00788 [pdf, other]

A Hybrid CNN-LSTM model for Video Deepfake Detection by Leveraging Optical Flow Features

Authors: Pallabi Saikia, Dhwani Dholaria, Priyanka Yadav, Vaidehi Patel, Mohendra Roy

Abstract: Deepfakes are the synthesized digital media in order to create ultra-realistic fake videos to trick the spectator. Deep generative algorithms, such as, Generative Adversarial Networks(GAN) are widely used to accomplish such tasks. This approach synthesizes pseudo-realistic contents that are very difficult to distinguish by traditional detection methods. In most cases, Convolutional Neural Network(… ▽ More Deepfakes are the synthesized digital media in order to create ultra-realistic fake videos to trick the spectator. Deep generative algorithms, such as, Generative Adversarial Networks(GAN) are widely used to accomplish such tasks. This approach synthesizes pseudo-realistic contents that are very difficult to distinguish by traditional detection methods. In most cases, Convolutional Neural Network(CNN) based discriminators are being used for detecting such synthesized media. However, it emphasise primarily on the spatial attributes of individual video frames, thereby fail to learn the temporal information from their inter-frame relations. In this paper, we leveraged an optical flow based feature extraction approach to extract the temporal features, which are then fed to a hybrid model for classification. This hybrid model is based on the combination of CNN and recurrent neural network (RNN) architectures. The hybrid model provides effective performance on open source data-sets such as, DFDC, FF++ and Celeb-DF. This proposed method shows an accuracy of 66.26%, 91.21% and 79.49% in DFDC, FF++, and Celeb-DF respectively with a very reduced No of sample size of approx 100 samples(frames). This promises early detection of fake contents compared to existing modalities. △ Less

Submitted 28 July, 2022; originally announced August 2022.

Journal ref: Copyright is with IEEE, Paper No: 832, IJCNN, 2022 IEEE World Congress on Computational Intelligence

arXiv:2207.13500 [pdf, other]

Modelling Social Context for Fake News Detection: A Graph Neural Network Based Approach

Authors: Pallabi Saikia, Kshitij Gundale, Ankit Jain, Dev Jadeja, Harvi Patel, Mohendra Roy

Abstract: Detection of fake news is crucial to ensure the authenticity of information and maintain the news ecosystems reliability. Recently, there has been an increase in fake news content due to the recent proliferation of social media and fake content generation techniques such as Deep Fake. The majority of the existing modalities of fake news detection focus on content based approaches. However, most of… ▽ More Detection of fake news is crucial to ensure the authenticity of information and maintain the news ecosystems reliability. Recently, there has been an increase in fake news content due to the recent proliferation of social media and fake content generation techniques such as Deep Fake. The majority of the existing modalities of fake news detection focus on content based approaches. However, most of these techniques fail to deal with ultra realistic synthesized media produced by generative models. Our recent studies find that the propagation characteristics of authentic and fake news are distinguishable, irrespective of their modalities. In this regard, we have investigated the auxiliary information based on social context to detect fake news. This paper has analyzed the social context of fake news detection with a hybrid graph neural network based approach. This hybrid model is based on integrating a graph neural network on the propagation of news and bi directional encoder representations from the transformers model on news content to learn the text features. Thus this proposed approach learns the content as well as the context features and hence able to outperform the baseline models with an f1 score of 0.91 on PolitiFact and 0.93 on the Gossipcop dataset, respectively △ Less

Submitted 27 July, 2022; originally announced July 2022.

Journal ref: copyright with IEEE, Paper No: 834, IJCNN, 2022 IEEE World Congress on Computational Intelligence

arXiv:2207.12395 [pdf, other]

Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics

Authors: Jeffrey Negrea, Jun Yang, Haoyue Feng, Daniel M. Roy, Jonathan H. Huggins

Abstract: The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choi… ▽ More The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models. △ Less

Submitted 20 July, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

Comments: 42 pgs

arXiv:2207.02747 [pdf, ps, other]

doi 10.1112/mtk.12207

Dimension formulas for Siegel modular forms of level $4$

Authors: Manami Roy, Ralf Schmidt, Shaoyun Yi

Abstract: We prove several dimension formulas for spaces of scalar-valued Siegel modular forms of degree $2$ with respect to certain congruence subgroups of level $4$. In case of cusp forms, all modular forms considered originate from cuspidal automorphic representations of $\mathrm{GSp}(4,\mathbb{A})$ whose local component at $p=2$ admits non-zero fixed vectors under the principal congruence subgroup of le… ▽ More We prove several dimension formulas for spaces of scalar-valued Siegel modular forms of degree $2$ with respect to certain congruence subgroups of level $4$. In case of cusp forms, all modular forms considered originate from cuspidal automorphic representations of $\mathrm{GSp}(4,\mathbb{A})$ whose local component at $p=2$ admits non-zero fixed vectors under the principal congruence subgroup of level $2$. Using known dimension formulas combined with dimensions of spaces of fixed vectors in local representations at $p=2$, we obtain formulas for the number of relevant automorphic representations. These in turn lead to new dimension formulas, in particular for Siegel modular forms with respect to the Klingen congruence subgroup of level $4$. △ Less

Submitted 19 September, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: 48 pages. Fixed some typographical errors and improved exposition. Final version which has been published in Mathematika

MSC Class: 11F46; 11F70

Journal ref: Mathematika 69 (2023), no. 3, 795-840

arXiv:2206.14800 [pdf, other]

Understanding Generalization via Leave-One-Out Conditional Mutual Information

Authors: Mahdi Haghifam, Shay Moran, Daniel M. Roy, Gintare Karolina Dziugaite

Abstract: We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean… ▽ More We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a constant factor of two. As an application, we analyze the population risk of the one-inclusion graph algorithm, a general-purpose transductive learning algorithm for VC classes in the realizable setting. Using leave-one-out CMI, we match the optimal bound for learning VC classes in the realizable setting, answering an open challenge raised by Steinke and Zakynthinou (2020). Finally, in order to understand the role of leave-one-out CMI in studying generalization, we place leave-one-out CMI in a hierarchy of measures, with a novel unconditional mutual information at the root. For 0-1 loss and interpolating learning algorithms, this mutual information is observed to be precisely the risk. △ Less

Submitted 29 June, 2022; originally announced June 2022.

Comments: 18 pages

arXiv:2206.02768 [pdf, other]

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

Authors: Mufan Bill Li, Mihai Nica, Daniel M. Roy

Abstract: The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that sha** the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current inf… ▽ More The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that sha** the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this sha** method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function. △ Less

Submitted 14 June, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: 48 pages, 10 figures. Advances in Neural Information Processing Systems (2022)

arXiv:2205.12291 [pdf, other]

doi 10.1093/mnras/stac1465

Gamma-rays from the circumgalactic medium of M31

Authors: Manami Roy, Biman B. Nath

Abstract: We discuss the production of $γ$-rays from cosmic rays (CR) in the circumgalactic medium (CGM) of Andromeda (M31) in light of the recent detection of $γ$-rays from an annular region of $\sim 5.5-120$ kpc away from the M31 disc. We consider the CRs accelerated as a result of the star-formation in the M31 disk, which are lifted to the CGM by advection due to outflow and CR diffusion. The advection t… ▽ More We discuss the production of $γ$-rays from cosmic rays (CR) in the circumgalactic medium (CGM) of Andromeda (M31) in light of the recent detection of $γ$-rays from an annular region of $\sim 5.5-120$ kpc away from the M31 disc. We consider the CRs accelerated as a result of the star-formation in the M31 disk, which are lifted to the CGM by advection due to outflow and CR diffusion. The advection time scale due to bulk flow of gas triggered by star formation activity in the M31 disc is comparable ($\sim$ Gyr) to the diffusion time scale with diffusion coefficient $\ge10^{29}$ cm$^2$ s$^{-1}$ for the propagation of CR protons with energy $\sim 412$ GeV that are responsible for the highest energy photons observed. We show that a leptonic origin of the $γ$-rays from cosmic ray (CR) electrons has difficulties, as the inverse Compton time scale ($\sim$Myr) is much lower than advection time scale ($\sim$Gyr) to reach $120$ kpc. Invoking CR electrons accelerated by accretion shocks in the CGM at $\sim100-120$ kpc does not help since it would lead to diffuse X-ray features that are not observed. We, therefore, study the production of $γ$-rays via hadronic interaction between CR protons and CGM gas with the help of numerical two-fluid (thermal + CR) hydrodynamical simulation. We find that a combination of these mechanisms, that are related to the star formation processes in M31 in the last $\sim $ Gyr, along with diffusion and hadronic interaction, can explain the observed flux from the CGM of M31. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 10 pages, 5 figures, Accepted for publication in MNRAS on May 19, 2022

arXiv:2205.09154 [pdf, ps, other]

On the structure of finitely presented Bestvina-Brady groups

Authors: Priyavrat Deshpande, Mallika Roy

Abstract: Right-angled Artin groups and their subgroups are of great interest because of their geometric, combinatorial and algorithmic properties. It is convenient to define these groups using finite simplicial graphs. The isomorphism type of the group is uniquely determined by the graph. Moreover, many structural properties of right angled Artin groups can be expressed in terms of their defining graph.… ▽ More Right-angled Artin groups and their subgroups are of great interest because of their geometric, combinatorial and algorithmic properties. It is convenient to define these groups using finite simplicial graphs. The isomorphism type of the group is uniquely determined by the graph. Moreover, many structural properties of right angled Artin groups can be expressed in terms of their defining graph. In this article we address the question of understanding the structure of a class of subgroups of right-angled Artin groups in terms of the graph. Bestvina and Brady, in their seminal work, studied these subgroups (now called Bestvina-Brady groups or Artin kernels) from a finiteness conditions viewpoint. Unlike the right-angled Artin groups the isomorphism type of Bestvina-Brady groups is not uniquely determined by the defining graph. We prove that certain finitely presented Bestvina-Brady groups can be expressed as an iterated amalgamated product. Moreover, we show that this amalgamated product can be read off from the graph defining the ambient right-angled Artin group. △ Less

Submitted 14 December, 2023; v1 submitted 18 May, 2022; originally announced May 2022.

MSC Class: 20F36; 20F65; 08B25

arXiv:2205.02479 [pdf, other]

On the Non-flatness Nature of Noncommutative Minkowski Spacetime

Authors: Manali Roy, B. Muthukumar

Abstract: In the framework of twisted-diffeomorphism approach to noncommutative gravity with canonical/Moyal-Weyl type noncommutative (NC) coordinate structure, we show that the NC Minkowski spacetime parametrized either with spherical polar coordinates or with parabolic coordinates has nontrivial NC corrections to Riemann curvature tensor, Ricci tensor and curvature scalar. Apparently, there are no such co… ▽ More In the framework of twisted-diffeomorphism approach to noncommutative gravity with canonical/Moyal-Weyl type noncommutative (NC) coordinate structure, we show that the NC Minkowski spacetime parametrized either with spherical polar coordinates or with parabolic coordinates has nontrivial NC corrections to Riemann curvature tensor, Ricci tensor and curvature scalar. Apparently, there are no such corrections if we choose rectilinear coordinates or even cylindrical coordinates, for which the metric in the commutative-counterpart is dependent on at most one coordinate only. We present both first order and second order calculations. The emergent curvature corrections might seem to raise the question of whether the statement of curvature is coordinate-dependent, but note, for example, that the transformation from spherical polar system to Cartesian system is not a diffeomorphism as such since its non-injective nature makes it a local diffeomorphism. In other words, if the flat-spacetime metric tensor in the commutative case depends on more than one curvilinear coordinates, the introduction of noncommutativity among these coordinates can possibly make the spacetime curved. It is worth remarking that such a curvature emerges in the context of NC Minkowski spacetime in the absence of any gauge or matter fields. It is purely an NC geometric effect. △ Less

Submitted 17 August, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: 1+14 pages, No figures, v3: minor changes, one reference added

Showing 1–50 of 275 results for author: Roy, M