Search | arXiv e-print repository

A Perspective on Foundation Models for the Electric Power Grid

Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belvi, Ricardo J. Bessa, Bishnu Prasad Bhattari , et al. (2 additional authors not shown)

Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transition and climate change. In this paper, we call for the development of, and state why we believe in, the potential of FMs for electric grids. We highlight their strengths and weaknesses amidst the challenges of a changing grid. We argue that an FM learning from diverse grid data and topologies could unlock transformative capabilities, pioneering a new approach in leveraging AI to redefine how we manage complexity and uncertainty in the electric grid. Finally, we discuss a power grid FM concept, namely GridFM, based on graph neural networks and show how different downstream tasks benefit. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Lead contact: H.F.H.; Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S

arXiv:2407.09381 [pdf, other]

The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited

Authors: Floriano Tori, Vincent Holst, Vincent Ginis

Abstract: Message passing is the dominant paradigm in Graph Neural Networks (GNNs). The efficiency of message passing, however, can be limited by the topology of the graph. This happens when information is lost during propagation due to being oversquashed when travelling through bottlenecks. To remedy this, recent efforts have focused on graph rewiring techniques, which disconnect the input graph originatin… ▽ More Message passing is the dominant paradigm in Graph Neural Networks (GNNs). The efficiency of message passing, however, can be limited by the topology of the graph. This happens when information is lost during propagation due to being oversquashed when travelling through bottlenecks. To remedy this, recent efforts have focused on graph rewiring techniques, which disconnect the input graph originating from the data and the computational graph, on which message passing is performed. A prominent approach for this is to use discrete graph curvature measures, of which several variants have been proposed, to identify and rewire around bottlenecks, facilitating information propagation. While oversquashing has been demonstrated in synthetic datasets, in this work we reevaluate the performance gains that curvature-based rewiring brings to real-world datasets. We show that in these datasets, edges selected during the rewiring process are not in line with theoretical criteria identifying bottlenecks. This implies they do not necessarily oversquash information during message passing. Subsequently, we demonstrate that SOTA accuracies on these datasets are outliers originating from sweeps of hyperparameters -- both the ones for training and dedicated ones related to the rewiring algorithm -- instead of consistent performance gains. In conclusion, our analysis nuances the effectiveness of curvature-based rewiring in real-world datasets and brings a new perspective on the methods to evaluate GNN accuracy improvements. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 19 pages, 10 figures

arXiv:2407.08773 [pdf, other]

Dark matter limits from the tip of the red giant branch of globular clusters

Authors: Haozhi Hong, Aaron C. Vincent

Abstract: Capture and annihilation of WIMP-like dark matter in red giant stars can lead to faster-than-expected ignition of the helium core, and thus a lower tip of the red giant branch (TRGB) luminosity. We use Gaia data to place constraints on the dark matter-nucleon cross section using 22 globular clusters with measured TRGB luminosities, and place projections on the sensitivity resulting from 161 cluste… ▽ More Capture and annihilation of WIMP-like dark matter in red giant stars can lead to faster-than-expected ignition of the helium core, and thus a lower tip of the red giant branch (TRGB) luminosity. We use Gaia data to place constraints on the dark matter-nucleon cross section using 22 globular clusters with measured TRGB luminosities, and place projections on the sensitivity resulting from 161 clusters with full phase space distributions observed by Gaia. Although limits remain weaker than those from Earth-based direct detection experiments, they represent a constraint that is fully independent of dark matter properties in the Solar neighbourhood, probing its properties across the entire Milky Way galaxy. Based on our findings, it is likely that the use of the TRGB as a standard candle in $H_0$ measurements is very robust against the effects of dark matter. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 10 pages, 6 figures

arXiv:2407.08722 [pdf, other]

Unifying 3D Representation and Control of Diverse Robots with a Single Camera

Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an… ▽ More Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an open challenge to model and control bio-inspired robots that are often multi-material or soft, lack sensing capabilities, and may change their material properties with use. Here, we introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone. Our approach makes no assumptions about the robot's materials, actuation, or sensing, requires only a single camera for control, and learns to control the robot without expert intervention by observing the execution of random commands. We demonstrate our method on a diverse set of robot manipulators, varying in actuation, materials, fabrication, and cost. Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot. By enabling robot control with a generic camera as the only sensor, we anticipate our work will dramatically broaden the design space of robotic systems and serve as a starting point for lowering the barrier to robotic automation. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

arXiv:2407.08612 [pdf, other]

Reducing Uncertainty Through Mutual Information in Structural and Systems Biology

Authors: Vincent D. Zaballa, Elliot E. Hui

Abstract: Systems biology models are useful models of complex biological systems that may require a large amount of experimental data to fit each model's parameters or to approximate a likelihood function. These models range from a few to thousands of parameters depending on the complexity of the biological system modeled, potentially making the task of fitting parameters to the model difficult - especially… ▽ More Systems biology models are useful models of complex biological systems that may require a large amount of experimental data to fit each model's parameters or to approximate a likelihood function. These models range from a few to thousands of parameters depending on the complexity of the biological system modeled, potentially making the task of fitting parameters to the model difficult - especially when new experimental data cannot be gathered. We demonstrate a method that uses structural biology predictions to augment systems biology models to improve systems biology models' predictions without having to gather more experimental data. Additionally, we show how systems biology models' predictions can help evaluate novel structural biology hypotheses, which may also be expensive or infeasible to validate. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted at the 1st Machine Learning for Life and Material Sciences Workshop at ICML 2024

arXiv:2407.08223 [pdf, other]

Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Authors: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, **gbo Shang, Chen-Yu Lee, Tomas Pfister

Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Specul… ▽ More Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 51% compared to conventional RAG systems on PubHealth. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.08218 [pdf, ps, other]

On Tree Automata, Generating Functions, and Differential Equations

Authors: Rida Ait El Manssour, Vincent Cheval, Mahsa Shirmohammadi, James Worrell

Abstract: In this paper we introduce holonomic tree automata: a common extension of weighted tree automata and holonomic recurrences. We show that the generating function of the tree series represented by such an automaton is differentially algebraic. Conversely, we give an algorithm that inputs a differentially algebraic power series, represented as a solution of a rational dynamical system, and outputs an… ▽ More In this paper we introduce holonomic tree automata: a common extension of weighted tree automata and holonomic recurrences. We show that the generating function of the tree series represented by such an automaton is differentially algebraic. Conversely, we give an algorithm that inputs a differentially algebraic power series, represented as a solution of a rational dynamical system, and outputs an automaton whose generating function is the given series. Such an automaton yields a recurrence that can be used to compute the terms of the power series. We use the algorithm to obtain automaton representations of exponential generating functions of families of combinatorial objects given as combinatorial species. Using techniques from differential algebra, we show that it is decidable both whether two automata represent the same formal tree series and whether they have the same generating function. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08086 [pdf, other]

The GeometricKernels Package: Heat and Matérn Kernels for Geometric Learning on Manifolds, Meshes, and Graphs

Authors: Peter Mostowsky, Vincent Dutordoir, Iskander Azangulov, Noémie Jaquier, Michael John Hutchinson, Aditya Ravuri, Leonel Rozo, Alexander Terenin, Viacheslav Borovitskiy

Abstract: Kernels are a fundamental technical primitive in machine learning. In recent years, kernel-based methods such as Gaussian processes are becoming increasingly important in applications where quantifying uncertainty is of key interest. In settings that involve structured data defined on graphs, meshes, manifolds, or other related spaces, defining kernels with good uncertainty-quantification behavior… ▽ More Kernels are a fundamental technical primitive in machine learning. In recent years, kernel-based methods such as Gaussian processes are becoming increasingly important in applications where quantifying uncertainty is of key interest. In settings that involve structured data defined on graphs, meshes, manifolds, or other related spaces, defining kernels with good uncertainty-quantification behavior, and computing their value numerically, is less straightforward than in the Euclidean setting. To address this difficulty, we present GeometricKernels, a software package which implements the geometric analogs of classical Euclidean squared exponential - also known as heat - and Matérn kernels, which are widely-used in settings where uncertainty is of key interest. As a byproduct, we obtain the ability to compute Fourier-feature-type expansions, which are widely used in their own right, on a wide set of geometric spaces. Our implementation supports automatic differentiation in every major current framework simultaneously via a backend-agnostic design. In this companion paper to the package and its documentation, we outline the capabilities of the package and present an illustrated example of its interface. We also include a brief overview of the theory the package is built upon and provide some historic context in the appendix. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07833 [pdf, other]

Viscoelastic dip-coating beyond lubrication theory

Authors: Minkush Kansal, Charu Datt, Vincent Bertin, Jacco H. Snoeijer

Abstract: The dip-coating geometry, where a solid plate is withdrawn from or plunged into a liquid pool, offers a prototypical example of wetting flows involving contact-line motion. Such flows are commonly studied using the lubrication approximation approach which is intrinsically limited to small interface slopes and thus small contact angles. Flows for arbitrary contact angles, however, can be studied us… ▽ More The dip-coating geometry, where a solid plate is withdrawn from or plunged into a liquid pool, offers a prototypical example of wetting flows involving contact-line motion. Such flows are commonly studied using the lubrication approximation approach which is intrinsically limited to small interface slopes and thus small contact angles. Flows for arbitrary contact angles, however, can be studied using a generalized lubrication theory that builds upon viscous corner flow solutions. Here we derive this generalized lubrication theory for viscoelastic liquids that exhibit normal stress effects and are modelled using the second-order fluid model. We apply our theory to advancing and receding contact lines in dip-coating, highlighting the influence of viscoelastic normal stresses for contact line motion at arbitrary contact angle. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 27 pages, 9 figures

arXiv:2407.07784 [pdf, other]

First demonstration of Super-X divertor exhaust control for transient heat load management in compact fusion reactors

Authors: B. Kool, K. Verhaegh, G. L. Derks, T. A. Wijkamp, N. Lonigro, R. Doyle, G. McArdle, C. Vincent, J. Lovell, F. Federici, S. S. Henderson, R. T. Osawa, D. Brida, H. Reimerdes, M. van Berkel, The EUROfusion tokamak exploitation team, the MAST-U team

Abstract: Nuclear fusion could offer clean, abundant energy. However, managing the immense power exhausted from the core fusion plasma towards the divertor remains a major challenge. This is compounded in emerging compact reactor designs which promise more cost-effective pathways towards commercial fusion energy. Alternative divertor configurations (ADCs) are a potential solution to this challenge. In this… ▽ More Nuclear fusion could offer clean, abundant energy. However, managing the immense power exhausted from the core fusion plasma towards the divertor remains a major challenge. This is compounded in emerging compact reactor designs which promise more cost-effective pathways towards commercial fusion energy. Alternative divertor configurations (ADCs) are a potential solution to this challenge. In this work, we demonstrate exhaust control in ADCs for the first time, on MAST-U. We employ a novel diagnostic strategy for the neutral gas buffer which shields the target. Our work shows that ADCs tackle key risks and uncertainties in realising fusion energy: 1) an enlarged operating window which 2) improves exhaust control through the absorption of transients which can remove the neutral shield and damage the divertor, 3) isolation of each divertor from other reactor regions, enabling combined control. This showcases real-world benefits of alternative divertors for effective heat load management and control in reactors. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07773 [pdf, other]

Finite Blocklength Performance of Capacity-achieving Codes in the Light of Complexity Theory

Authors: Holger Boche, Andrea Grigorescu, Rafael F. Schaefer, H. Vincent Poor

Abstract: Since the work of Polyanskiy, Poor and Verdú on the finite blocklength performance of capacity-achieving codes for discrete memoryless channels, many papers have attempted to find further results for more practically relevant channels. However, it seems that the complexity of computing capacity-achieving codes has not been investigated until now. We study this question for the simplest non-trivial… ▽ More Since the work of Polyanskiy, Poor and Verdú on the finite blocklength performance of capacity-achieving codes for discrete memoryless channels, many papers have attempted to find further results for more practically relevant channels. However, it seems that the complexity of computing capacity-achieving codes has not been investigated until now. We study this question for the simplest non-trivial Gaussian channels, i.e., the additive colored Gaussian noise channel. To assess the computational complexity, we consider the classes $\mathrm{FP}_1$ and $\#\mathrm{P}_1$. $\mathrm{FP}_1$ includes functions computable by a deterministic Turing machine in polynomial time, whereas $\#\mathrm{P}_1$ encompasses functions that count the number of solutions verifiable in polynomial time. It is widely assumed that $\mathrm{FP}_1\neq\#\mathrm{P}_1$. It is of interest to determine the conditions under which, for a given $M \in \mathbb{N}$, where $M$ describes the precision of the deviation of $C(P,N)$, for a certain blocklength $n_M$ and a decoding error $ε> 0$ with $ε\in\mathbb{Q}$, the following holds: $R_{n_M}(ε)>C(P,N)-\frac{1}{2^M}$. It is shown that there is a polynomial-time computable $N_*$ such that for sufficiently large $P_*\in\mathbb{Q}$, the sequences $\{R_{n_M}(ε)\}_{{n_M}\in\mathbb{N}}$, where each $R_{n_M}(ε)$ satisfies the previous condition, cannot be computed in polynomial time if $\mathrm{FP}_1\neq\#\mathrm{P}_1$. Hence, the complexity of computing the sequence $\{R_{n_M}(ε)\}_{n_M\in\mathbb{N}}$ grows faster than any polynomial as $M$ increases. Consequently, it is shown that either the sequence of achievable rates $\{R_{n_M}(ε)\}_{n_M\in\mathbb{N}}$ as a function of the blocklength, or the sequence of blocklengths $\{n_M\}_{M\in\mathbb{N}}$ corresponding to the achievable rates, is not a polynomial-time computable sequence. △ Less

Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: The results were presented at ISIT 2024 in the recent result session. The ISIT 2024 poster for the extended abstract is attached to the paper

arXiv:2407.07719 [pdf, other]

Model-based learning for multi-antenna multi-frequency location-to-channel map**

Authors: Baptiste Chatelier, Vincent Corlay, Matthieu Crussière, Luc Le Magoarou

Abstract: Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel map** can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-t… ▽ More Years of study of the propagation channel showed a close relation between a location and the associated communication channel response. The use of a neural network to learn the location-to-channel map** can therefore be envisioned. The Implicit Neural Representation (INR) literature showed that classical neural architecture are biased towards learning low-frequency content, making the location-to-channel map** learning a non-trivial problem. Indeed, it is well known that this map** is a function rapidly varying with the location, on the order of the wavelength. This paper leverages the model-based machine learning paradigm to derive a problem-specific neural architecture from a propagation channel model. The resulting architecture efficiently overcomes the spectral-bias issue. It only learns low-frequency sparse correction terms activating a dictionary of high-frequency components. The proposed architecture is evaluated against classical INR architectures on realistic synthetic data, showing much better accuracy. Its map** learning performance is explained based on the approximated channel model, highlighting the explainability of the model-based machine learning paradigm. △ Less

Submitted 17 June, 2024; originally announced July 2024.

arXiv:2407.07671 [pdf, ps, other]

Why should we ever automate moral decision making?

Authors: Vincent Conitzer

Abstract: While people generally trust AI to make decisions in various aspects of their lives, concerns arise when AI is involved in decisions with significant moral implications. The absence of a precise mathematical framework for moral reasoning intensifies these concerns, as ethics often defies simplistic mathematical models. Unlike fields such as logical reasoning, reasoning under uncertainty, and strat… ▽ More While people generally trust AI to make decisions in various aspects of their lives, concerns arise when AI is involved in decisions with significant moral implications. The absence of a precise mathematical framework for moral reasoning intensifies these concerns, as ethics often defies simplistic mathematical models. Unlike fields such as logical reasoning, reasoning under uncertainty, and strategic decision-making, which have well-defined mathematical frameworks, moral reasoning lacks a broadly accepted framework. This absence raises questions about the confidence we can place in AI's moral decision-making capabilities. The environments in which AI systems are typically trained today seem insufficiently rich for such a system to learn ethics from scratch, and even if we had an appropriate environment, it is unclear how we might bring about such learning. An alternative approach involves AI learning from human moral decisions. This learning process can involve aggregating curated human judgments or demonstrations in specific domains, or leveraging a foundation model fed with a wide range of data. Still, concerns persist, given the imperfections in human moral decision making. Given this, why should we ever automate moral decision making -- is it not better to leave all moral decision making to humans? This paper lays out a number of reasons why we should expect AI systems to engage in decisions with a moral component, with brief discussions of the associated risks. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Journal ref: In: Ethics and Trust in Human-AI Collaboration: Socio-Technical Approaches, 2023

arXiv:2407.07625 [pdf, ps, other]

doi 10.1609/aaai.v38i9.28817

The Complexity of Computing Robust Mediated Equilibria in Ordinal Games

Authors: Vincent Conitzer

Abstract: Usually, to apply game-theoretic methods, we must specify utilities precisely, and we run the risk that the solutions we compute are not robust to errors in this specification. Ordinal games provide an attractive alternative: they require specifying only which outcomes are preferred to which other ones. Unfortunately, they provide little guidance for how to play unless there are pure Nash equilibr… ▽ More Usually, to apply game-theoretic methods, we must specify utilities precisely, and we run the risk that the solutions we compute are not robust to errors in this specification. Ordinal games provide an attractive alternative: they require specifying only which outcomes are preferred to which other ones. Unfortunately, they provide little guidance for how to play unless there are pure Nash equilibria; evaluating mixed strategies appears to fundamentally require cardinal utilities. In this paper, we observe that we can in fact make good use of mixed strategies in ordinal games if we consider settings that allow for folk theorems. These allow us to find equilibria that are robust, in the sense that they remain equilibria no matter which cardinal utilities are the correct ones -- as long as they are consistent with the specified ordinal preferences. We analyze this concept and study the computational complexity of finding such equilibria in a range of settings. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Journal ref: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

arXiv:2407.07616 [pdf, other]

Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift

Authors: Elliot Vincent, Jean Ponce, Mathieu Aubry

Abstract: Satellite imagery plays a crucial role in monitoring changes happening on Earth's surface and aiding in climate analysis, ecosystem assessment, and disaster response. In this paper, we tackle semantic change detection with satellite image time series (SITS-SCD) which encompasses both change detection and semantic segmentation tasks. We propose a new architecture that improves over the state of the… ▽ More Satellite imagery plays a crucial role in monitoring changes happening on Earth's surface and aiding in climate analysis, ecosystem assessment, and disaster response. In this paper, we tackle semantic change detection with satellite image time series (SITS-SCD) which encompasses both change detection and semantic segmentation tasks. We propose a new architecture that improves over the state of the art, scales better with the number of parameters, and leverages long-term temporal information. However, for practical use cases, models need to adapt to spatial and temporal shifts, which remains a challenge. We investigate the impact of temporal and spatial shifts separately on global, multi-year SITS datasets using DynamicEarthNet and MUDS. We show that the spatial domain shift represents the most complex setting and that the impact of temporal shift on performance is more pronounced on change detection than on semantic segmentation, highlighting that it is a specific issue deserving further attention. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07408 [pdf, other]

STONE: Self-supervised Tonality Estimator

Authors: Yuexuan Kong, Vincent Lostanlen, Gabriel Meseguer-Brocal, Stella Wong, Mathieu Lagrange, Romain Hennequin

Abstract: Although deep neural networks can estimate the key of a musical piece, their supervision incurs a massive annotation effort. Against this shortcoming, we present STONE, the first self-supervised tonality estimator. The architecture behind STONE, named ChromaNet, is a convnet with octave equivalence which outputs a key signature profile (KSP) of 12 structured logits. First, we train ChromaNet to re… ▽ More Although deep neural networks can estimate the key of a musical piece, their supervision incurs a massive annotation effort. Against this shortcoming, we present STONE, the first self-supervised tonality estimator. The architecture behind STONE, named ChromaNet, is a convnet with octave equivalence which outputs a key signature profile (KSP) of 12 structured logits. First, we train ChromaNet to regress artificial pitch transpositions between any two unlabeled musical excerpts from the same audio track, as measured as cross-power spectral density (CPSD) within the circle of fifths (CoF). We observe that this self-supervised pretext task leads KSP to correlate with tonal key signature. Based on this observation, we extend STONE to output a structured KSP of 24 logits, and introduce supervision so as to disambiguate major versus minor keys sharing the same key signature. Applying different amounts of supervision yields semi-supervised and fully supervised tonality estimators: i.e., Semi-TONEs and Sup-TONEs. We evaluate these estimators on FMAK, a new dataset of 5489 real-world musical recordings with expert annotation of 24 major and minor keys. We find that Semi-TONE matches the classification accuracy of Sup-TONE with reduced supervision and outperforms it with equal supervision. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07385 [pdf, other]

doi 10.1038/s41550-023-02016-7

Evidence for auroral influence on Jupiter's nitrogen and oxygen chemistry revealed by ALMA

Authors: Thibault Cavalié, Ladislav Rezac, Raphael Moreno, Emmanuel Lellouch, Thierry Fouchet, Bilal Benmahi, Thomas K. Greathouse, James A. Sinclair, Vincent Hue, Paul Hartogh, Michel Dobrijevic, Nathalie Carrasco, Zoé Perrin

Abstract: The localized delivery of new long-lived species to Jupiter's stratosphere by comet Shoemaker-Levy 9 in 1994 opened a window to constrain Jovian chemistry and dynamics by monitoring the evolution of their vertical and horizontal distributions. However, the spatial distributions of CO and HCN, two of these long-lived species, had never been jointly observed at high latitudinal resolution. Atacama l… ▽ More The localized delivery of new long-lived species to Jupiter's stratosphere by comet Shoemaker-Levy 9 in 1994 opened a window to constrain Jovian chemistry and dynamics by monitoring the evolution of their vertical and horizontal distributions. However, the spatial distributions of CO and HCN, two of these long-lived species, had never been jointly observed at high latitudinal resolution. Atacama large millimeter/submillimeter array observations of HCN and CO in March 2017 show that CO was meridionally uniform and restricted to pressures lower than 3 $\pm$ 1 mbar. HCN shared a similar vertical distribution in the low- to mid-latitudes, but was depleted at pressures between 2$^{+2}_ {-1}$ and 0.04$^{+0.07}_{-0.03}$ mbar in the aurora and surrounding regions, resulting in a drop by two orders of magnitude in column density. We propose that heterogeneous chemistry bonds HCN on large aurora-produced aerosols at these pressures in the Jovian auroral regions causing the observed depletion. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 13 pages, 8 figures

Journal ref: Nature Astronomy 7 (2023) 1048-1055

arXiv:2407.07295 [pdf, other]

Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis

Authors: Jian-Qing Zheng, Yuanhan Mo, Yang Sun, Jiahua Li, Fu** Wu, Ziyang Wang, Tonia Vincent, Bartłomiej W. Papież

Abstract: In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, name… ▽ More In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, named Deformation-Recovery Diffusion Model (DRDM), diverges from traditional score/intensity and latent feature-based approaches, emphasizing morphological changes through deformation fields rather than direct image synthesis. This is achieved by introducing a topological-preserving deformation field generation method, which randomly samples and integrates a set of multi-scale Deformation Vector Fields (DVF). DRDM is trained to learn to recover unreasonable deformation components, thereby restoring each randomly deformed image to a realistic distribution. These innovations facilitate the generation of diverse and anatomically plausible deformations, enhancing data augmentation and synthesis for further analysis in downstream tasks, such as few-shot learning and image registration. Experimental results in cardiac MRI and pulmonary CT show DRDM is capable of creating diverse, large (over 10% image size deformation scale), and high-quality (negative ratio of folding rate is lower than 1%) deformation fields. The further experimental results in downstream tasks, 2D image segmentation and 3D image registration, indicate significant improvements resulting from DRDM, showcasing the potential of our model to advance image manipulation and synthesis in medical imaging and beyond. Our implementation will be available at https://github.com/jianqingzheng/def_diff_rec. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.07206 [pdf]

doi 10.1103/PhysRevResearch.6.013163

Plasmonic Vortices Host Magnetoelectric Interactions

Authors: Atreyie Ghosh, Sena Yang, Yanan Dai, W. Vincent Liu, Hrvoje Petek

Abstract: The vector cross product and pseudoscalar dot products of electric (E) and magnetic (H) fields are separately finite in vacuum transverse electric and magnetic (TEM) plane waves, and angular momentum structured light. Current theories of interactions beyond the standard model of particle physics invoke non-zero dot(E,H) as the source term in the axion law that describes interactions with the cosmo… ▽ More The vector cross product and pseudoscalar dot products of electric (E) and magnetic (H) fields are separately finite in vacuum transverse electric and magnetic (TEM) plane waves, and angular momentum structured light. Current theories of interactions beyond the standard model of particle physics invoke non-zero dot(E,H) as the source term in the axion law that describes interactions with the cosmological dark matter axion particles outside of the quartet of Maxwells equations. The non-zero dot(E,H) also drives relativistic spin-charge magnetoelectric excitations of axion quasiparticles at a distinctively higher condensed matter scale in magnetic and topological materials. Yet, how to drive coherent dot(E,H) responses is unknown, and provides motivation to examine the field polarizations in structured light on a deep sub-diffraction limited spatial scale and sub-optical cycle temporal scale by ultrafast nonlinear photoemission electron microscopy. By analytical theory and ultrafast coherent photoemission electron microscopy, we image dot(E,H) fields in surface plasmon polariton vortex cores at subwavelength scales, where we find that the magnetoelectric relative to the dipole density is intensified on a ~10 nm diameter scale as a universal property of plasmonic vortex fields. The generation and nanoscale localization of dot(E,H) fields introduces the magnetoelectric symmetry class, having the parity and time reversal broken, but the joint parity-time reversal symmetry preserved. The ability to image the optical fields of plasmonic vortex cores opens the research of ultrafast microscopy of magnetoelectric responses and interactions with axion quasiparticles in solid state materials. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 5 Figures

Journal ref: Physical Review RESEARCH 6, 013163 (2024)

arXiv:2407.07183 [pdf, other]

Atom interferometry in an Einstein Elevator

Authors: Celia Pelluet, Romain Arguel, Martin Rabault, Vincent Jarlaud, Clement Metayer, Brynle Barrett, Philippe Bouyer, Baptiste Battelier

Abstract: Recent advances in atom interferometry have led to the development of quantum inertial sensors with outstanding performance in terms of sensitivity, accuracy, and long-term stability. For ground-based implementations, these sensors are ultimately limited by the free-fall height of atomic fountains required to interrogate the atoms over extended timescales. This limitation can be overcome in Space… ▽ More Recent advances in atom interferometry have led to the development of quantum inertial sensors with outstanding performance in terms of sensitivity, accuracy, and long-term stability. For ground-based implementations, these sensors are ultimately limited by the free-fall height of atomic fountains required to interrogate the atoms over extended timescales. This limitation can be overcome in Space and in unique ``microgravity'' facilities such as drop towers or free-falling aircraft. These facilities require large investments, long development times, and place stringent constraints on instruments that further limit their widespread use. The available ``up time'' for experiments is also quite low, making extended studies challenging. In this work, we present a new approach in which atom interferometry is performed in a laboratory-scale Einstein Elevator. Our experiment is mounted to a moving platform that mimics the vertical free-fall trajectory every 13.5 seconds. With a total interrogation time of $2T = 200$ ms, we demonstrate an acceleration sensitivity of $6 \times 10^{-7}$ m/s$^{2}$ per shot, limited primarily by the temperature of our atomic samples. We further demonstrate the capability to perform long-term statistical studies by operating the Einstein Elevator over several days with high reproducibility. These represent state-of-the-art results achieved in microgravity and further demonstrates the potential of quantum inertial sensors in Space. Our microgravity platform is both an alternative to large atomic fountains and a versatile facility to prepare future Space missions. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06274 [pdf, other]

Probing populations of dark stellar remnants in the globular clusters 47 Tuc and Terzan 5 using pulsar timing

Authors: Peter J. Smith, Vincent Hénault-Brunet, Nolan Dickson, Mark Gieles, Holger Baumgardt

Abstract: We present a new method to combine multimass equilibrium dynamical models and pulsar timing data to constrain the mass distribution and remnant populations of Milky Way globular clusters (GCs). We first apply this method to 47 Tuc, a cluster for which there exists an abundance of stellar kinematic data and which is also host to a large population of millisecond pulsars. We demonstrate that the pul… ▽ More We present a new method to combine multimass equilibrium dynamical models and pulsar timing data to constrain the mass distribution and remnant populations of Milky Way globular clusters (GCs). We first apply this method to 47 Tuc, a cluster for which there exists an abundance of stellar kinematic data and which is also host to a large population of millisecond pulsars. We demonstrate that the pulsar timing data allow us to place strong constraints on the overall mass distribution and remnant populations even without fitting on stellar kinematics. Our models favor a small population of stellar-mass BHs in this cluster (with a total mass of $412^{+77}_{-146} \mathrm{M_\odot}$), arguing against the need for a large ($ > 2000 \ \mathrm{M_\odot}$) central intermediate-mass black hole. We then apply the method to Terzan 5, a heavily obscured bulge cluster which hosts the largest population of millisecond pulsars of any Milky Way GC and for which the collection of conventional stellar kinematic data is very limited. We improve existing constraints on the mass distribution and structural parameters of this cluster and place stringent constraints on its black hole content, finding an upper limit on the mass in BHs of $\sim 4500 \ \mathrm{M_\odot}$. This method allows us to probe the central dynamics of GCs even in the absence of stellar kinematic data and can be easily applied to other GCs with pulsar timing data, for which datasets will continue to grow with the next generation of radio telescopes. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 32 pages, 19 figures. Submitted to ApJ, comments welcome

arXiv:2407.06206 [pdf, other]

The Impact of an XAI-Augmented Approach on Binary Classification with Scarce Data

Authors: Ximing Wen, Rosina O. Weber, Anik Sen, Darryl Hannan, Steven C. Nesbit, Vincent Chan, Alberto Goffi, Michael Morris, John C. Hunninghake, Nicholas E. Villalobos, Edward Kim, Christopher J. MacLellan

Abstract: Point-of-Care Ultrasound (POCUS) is the practice of clinicians conducting and interpreting ultrasound scans right at the patient's bedside. However, the expertise needed to interpret these images is considerable and may not always be present in emergency situations. This reality makes algorithms such as machine learning classifiers extremely valuable to augment human decisions. POCUS devices are b… ▽ More Point-of-Care Ultrasound (POCUS) is the practice of clinicians conducting and interpreting ultrasound scans right at the patient's bedside. However, the expertise needed to interpret these images is considerable and may not always be present in emergency situations. This reality makes algorithms such as machine learning classifiers extremely valuable to augment human decisions. POCUS devices are becoming available at a reasonable cost in the size of a mobile phone. The challenge of turning POCUS devices into life-saving tools is that interpretation of ultrasound images requires specialist training and experience. Unfortunately, the difficulty to obtain positive training images represents an important obstacle to building efficient and accurate classifiers. Hence, the problem we try to investigate is how to explore strategies to increase accuracy of classifiers trained with scarce data. We hypothesize that training with a few data instances may not suffice for classifiers to generalize causing them to overfit. Our approach uses an Explainable AI-Augmented approach to help the algorithm learn more from less and potentially help the classifier better generalize. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 7 pages, 3 figures, accepted by XAI 2024 workshop @ IJCAI

arXiv:2407.06202 [pdf]

Ax, 3 polyominoes for tiling the plane non-periodically

Authors: Vincent Van Dongen, Pierre Gradit

Abstract: How do people come up with new sets of tiles including new tile shapes that would only tile non-periodically? This paper presents our graphical journey in tilings and provides a new set of three polyominoes named Ax for its relationship with Ammann A4. How do people come up with new sets of tiles including new tile shapes that would only tile non-periodically? This paper presents our graphical journey in tilings and provides a new set of three polyominoes named Ax for its relationship with Ammann A4. △ Less

Submitted 24 June, 2024; originally announced July 2024.

arXiv:2407.06183 [pdf, other]

Step** on the Edge: Curvature Aware Learning Rate Tuners

Authors: Vincent Roulet, Atish Agarwala, Jean-Bastien Grill, Grzegorz Swirszcz, Mathieu Blondel, Fabian Pedregosa

Abstract: Curvature information -- particularly, the largest eigenvalue of the loss Hessian, known as the sharpness -- often forms the basis for learning rate tuners. However, recent work has shown that the curvature information undergoes complex dynamics during training, going from a phase of increasing sharpness to eventual stabilization. We analyze the closed-loop feedback effect between learning rate tu… ▽ More Curvature information -- particularly, the largest eigenvalue of the loss Hessian, known as the sharpness -- often forms the basis for learning rate tuners. However, recent work has shown that the curvature information undergoes complex dynamics during training, going from a phase of increasing sharpness to eventual stabilization. We analyze the closed-loop feedback effect between learning rate tuning and curvature. We find that classical learning rate tuners may yield greater one-step loss reduction, yet they ultimately underperform in the long term when compared to constant learning rates in the full batch regime. These models break the stabilization of the sharpness, which we explain using a simplified model of the joint dynamics of the learning rate and the curvature. To further investigate these effects, we introduce a new learning rate tuning method, Curvature Dynamics Aware Tuning (CDAT), which prioritizes long term curvature stabilization over instantaneous progress on the objective. In the full batch regime, CDAT shows behavior akin to prefixed warm-up schedules on deep learning objectives, outperforming tuned constant learning rates. In the mini batch regime, we observe that stochasticity introduces confounding effects that explain the previous success of some learning rate tuners at appropriate batch sizes. Our findings highlight the critical role of understanding the joint dynamics of the learning rate and curvature, beyond greedy minimization, to diagnose failures and design effective adaptive learning rate tuners. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.06148 [pdf, other]

doi 10.1016/j.scriptamat.2023.115960

In-situ localization of damage in a Zn-Al-Mg coating deposited on steel by continuous hot-dip galvanizing

Authors: Houssem Eddine Chaieb, Vincent Maurel, Kais Ammar, Samuel Forest, Alexandre Tanguy, Eva Héripré, Franck Nozahic, Jean-Michel Mataigne, Joost De Strycker

Abstract: Zn-Al-Mg coatings are characterized by a complex microstructure with dendritic and eutectic phases. This heterogeneous phase distribution contributes to multiple deformation and damage mechanisms. The presence of brittle phases promotes crack initiation and propagation. This study reveals a new deformation and damage mechanism of a Zn-Al-Mg coating, where twinning can induce crack initiation in th… ▽ More Zn-Al-Mg coatings are characterized by a complex microstructure with dendritic and eutectic phases. This heterogeneous phase distribution contributes to multiple deformation and damage mechanisms. The presence of brittle phases promotes crack initiation and propagation. This study reveals a new deformation and damage mechanism of a Zn-Al-Mg coating, where twinning can induce crack initiation in the eutectic region. The chronology of different events leading to crack initiation and propagation is clearly established by in-situ tensile testing in a scanning electron microscope, which helps to establish a detailed characterization of the mechanical behavior of the coating. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05453 [pdf, other]

Active Collaborative Visual SLAM exploiting ORB Features

Authors: Muhammad Farhan Ahmed, Vincent Frémont, Isabelle Fantoni

Abstract: In autonomous robotics, a significant challenge involves devising robust solutions for Active Collaborative SLAM (AC-SLAM). This process requires multiple robots to cooperatively explore and map an unknown environment by intelligently coordinating their movements and sensor data acquisition. In this article, we present an efficient visual AC-SLAM method using aerial and ground robots for environme… ▽ More In autonomous robotics, a significant challenge involves devising robust solutions for Active Collaborative SLAM (AC-SLAM). This process requires multiple robots to cooperatively explore and map an unknown environment by intelligently coordinating their movements and sensor data acquisition. In this article, we present an efficient visual AC-SLAM method using aerial and ground robots for environment exploration and map**. We propose an efficient frontiers filtering method that takes into account the common IoU map frontiers and reduces the frontiers for each robot. Additionally, we also present an approach to guide robots to previously visited goal positions to promote loop closure to reduce SLAM uncertainty. The proposed method is implemented in ROS and evaluated through simulations on publicly available datasets and similar methods, achieving an accumulative average of 59% of increase in area coverage. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 6 Pages, 7 Figures, 2 Tables

arXiv:2407.04683 [pdf, other]

Efficient Betti Matching Enables Topology-Aware 3D Segmentation via Persistent Homology

Authors: Nico Stucki, Vincent Bürgin, Johannes C. Paetzold, Ulrich Bauer

Abstract: In this work, we propose an efficient algorithm for the calculation of the Betti matching, which can be used as a loss function to train topology aware segmentation networks. Betti matching loss builds on techniques from topological data analysis, specifically persistent homology. A major challenge is the computational cost of computing persistence barcodes. In response to this challenge, we propo… ▽ More In this work, we propose an efficient algorithm for the calculation of the Betti matching, which can be used as a loss function to train topology aware segmentation networks. Betti matching loss builds on techniques from topological data analysis, specifically persistent homology. A major challenge is the computational cost of computing persistence barcodes. In response to this challenge, we propose a new, highly optimized implementation of Betti matching, implemented in C++ together with a python interface, which achieves significant speedups compared to the state-of-the-art implementation Cubical Ripser. We use Betti matching 3D to train segmentation networks with the Betti matching loss and demonstrate improved topological correctness of predicted segmentations across several datasets. The source code is available at https://github.com/nstucki/Betti-Matching-3D. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04592 [pdf, other]

Smell and Emotion: Recognising emotions in smell-related artworks

Authors: Vishal Patoliya, Mathias Zinnen, Andreas Maier, Vincent Christlein

Abstract: Emotions and smell are underrepresented in digital art history. In this exploratory work, we show that recognising emotions from smell-related artworks is technically feasible but has room for improvement. Using style transfer and hyperparameter optimization we achieve a minor performance boost and open up the field for future extensions. Emotions and smell are underrepresented in digital art history. In this exploratory work, we show that recognising emotions from smell-related artworks is technically feasible but has room for improvement. Using style transfer and hyperparameter optimization we achieve a minor performance boost and open up the field for future extensions. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 5 pages, 3 figures

arXiv:2407.03973 [pdf, ps, other]

Logical Operators and Fold-Transversal Gates of Bivariate Bicycle Codes

Authors: Jens Niklas Eberhardt, Vincent Steffan

Abstract: Quantum low-density parity-check (qLDPC) codes offer a promising route to scalable fault-tolerant quantum computation with constant overhead. Recent advancements have shown that qLDPC codes can outperform the quantum memory capability of surface codes even with near-term hardware. The question of how to implement logical gates fault-tolerantly for these codes is still open. We present new examples… ▽ More Quantum low-density parity-check (qLDPC) codes offer a promising route to scalable fault-tolerant quantum computation with constant overhead. Recent advancements have shown that qLDPC codes can outperform the quantum memory capability of surface codes even with near-term hardware. The question of how to implement logical gates fault-tolerantly for these codes is still open. We present new examples of high-rate bivariate bicycle (BB) codes with enhanced symmetry properties. These codes feature explicit nice bases of logical operators (similar to toric codes) and support fold-transversal Clifford gates without overhead. As examples, we construct $[[98,6,12]]$ and $[[162, 8, 12]]$ BB codes which admit interesting fault-tolerant Clifford gates. Our work also lays the mathematical foundations for explicit bases of logical operators and fold-transversal gates in quantum two-block and group algebra codes, which might be of independent interest. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 21 pages, comments welcome

arXiv:2407.03899 [pdf, ps, other]

Hybrid NOMA Assisted OFDMA Uplink Transmission

Authors: Zhiguo Ding, H. Vincent Poor

Abstract: Hybrid non-orthogonal multiple access (NOMA) has recently received significant research interest due to its ability to efficiently use resources from different domains and also its compatibility with various orthogonal multiple access (OMA) based legacy networks. Unlike existing studies on hybrid NOMA that focus on combining NOMA with time-division multiple access (TDMA), this work considers hybri… ▽ More Hybrid non-orthogonal multiple access (NOMA) has recently received significant research interest due to its ability to efficiently use resources from different domains and also its compatibility with various orthogonal multiple access (OMA) based legacy networks. Unlike existing studies on hybrid NOMA that focus on combining NOMA with time-division multiple access (TDMA), this work considers hybrid NOMA assisted orthogonal frequency-division multiple access (OFDMA). In particular, the impact of a unique feature of hybrid NOMA assisted OFDMA, i.e., the availability of users' dynamic channel state information, on the system performance is analyzed from the following two perspectives. From the optimization perspective, analytical results are developed which show that with hybrid NOMA assisted OFDMA, the pure OMA mode is rarely adopted by the users, and the pure NOMA mode could be optimal for minimizing the users' energy consumption, which differs from the hybrid TDMA case. From the statistical perspective, two new performance metrics, namely the power outage probability and the power diversity gain, are developed to quantitatively measure the performance gain of hybrid NOMA over OMA. The developed analytical results also demonstrate the ability of hybrid NOMA to meet the users' diverse energy profiles. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02424 [pdf, other]

A Pattern Language for Machine Learning Tasks

Authors: Benjamin Rodatz, Ian Fan, Tuomas Laakkonen, Neil John Ortega, Thomas Hoffman, Vincent Wang-Mascianica

Abstract: Idealised as universal approximators, learners such as neural networks can be viewed as "variable functions" that may become one of a range of concrete functions after training. In the same way that equations constrain the possible values of variables in algebra, we may view objective functions as constraints on the behaviour of learners. We extract the equivalences perfectly optimised objective f… ▽ More Idealised as universal approximators, learners such as neural networks can be viewed as "variable functions" that may become one of a range of concrete functions after training. In the same way that equations constrain the possible values of variables in algebra, we may view objective functions as constraints on the behaviour of learners. We extract the equivalences perfectly optimised objective functions impose, calling them "tasks". For these tasks, we develop a formal graphical language that allows us to: (1) separate the core tasks of a behaviour from its implementation details; (2) reason about and design behaviours model-agnostically; and (3) simply describe and unify approaches in machine learning across domains. As proof-of-concept, we design a novel task that enables converting classifiers into generative models we call "manipulators", which we implement by directly translating task specifications into code. The resulting models exhibit capabilities such as style transfer and interpretable latent-space editing, without the need for custom architectures, adversarial training or random sampling. We formally relate the behaviour of manipulators to GANs, and empirically demonstrate their competitive performance with VAEs. We report on experiments across vision and language domains aiming to characterise manipulators as approximate Bayesian inversions of discriminative classifiers. △ Less

Submitted 2 July, 2024; originally announced July 2024.

MSC Class: 18M30; 68T01 ACM Class: I.2.6

arXiv:2407.02423 [pdf, other]

On the Anatomy of Attention

Authors: Nikhil Khatri, Tuomas Laakkonen, Jonathon Liu, Vincent Wang-Maścianica

Abstract: We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models. Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations, and important differences and similarities can be identified at a glance. In this paper, we focu… ▽ More We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models. Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations, and important differences and similarities can be identified at a glance. In this paper, we focus on attention mechanisms: translating folklore into mathematical derivations, and constructing a taxonomy of attention variants in the literature. As a first example of an empirical investigation underpinned by our formalism, we identify recurring anatomical components of attention, which we exhaustively recombine to explore a space of variations on the attention mechanism. △ Less

Submitted 7 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Replaced to fix typos

MSC Class: 68T01; 18M30 ACM Class: I.2.6

arXiv:2407.02285 [pdf, other]

Off-Grid Ultrasound Imaging by Stochastic Optimization

Authors: Vincent van de Schaft, Oisín Nolan, Ruud J. G. van Sloun

Abstract: Ultrasound images formed by delay-and-sum beamforming are plagued by artifacts that only clear up after compounding many transmissions. Some prior works pose imaging as an inverse problem. This approach can yield high image quality with few transmits, but requires a very fine image grid and is not robust to changes in measurement model parameters. We present INverse grid-Free Estimation of Reflect… ▽ More Ultrasound images formed by delay-and-sum beamforming are plagued by artifacts that only clear up after compounding many transmissions. Some prior works pose imaging as an inverse problem. This approach can yield high image quality with few transmits, but requires a very fine image grid and is not robust to changes in measurement model parameters. We present INverse grid-Free Estimation of Reflectivities (INFER), an off-grid and stochastic algorithm that solves the inverse scattering problem in ultrasound imaging. Our method jointly optimizes for the locations of the gridpoints, their reflectivities, and the measurement model parameters such as the speed of sound. This approach allows us to use significantly fewer gridpoints, while obtaining better contrast and resolution and being more robust to changes in the imaging target and the hardware. The use of stochastic optimization enables solving for multiple transmissions simultaneously without increasing the required memory or computational load per iteration. We show that our method works across different imaging targets and across different transmit schemes and compares favorably against other beamforming and inverse solvers. The source code and the dataset to reproduce the results in this paper are available at www.github.com/vincentvdschaft/off-grid-ultrasound. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02284 [pdf, other]

doi 10.21105/joss.06574

Renard: A Modular Pipeline for Extracting Character Networks from Narrative Texts

Authors: Arthur Amalvy, Vincent Labatut, Richard Dufour

Abstract: Renard (Relationships Extraction from NARrative Documents) is a Python library that allows users to define custom natural language processing (NLP) pipelines to extract character networks from narrative texts. Contrary to the few existing tools, Renard can extract dynamic networks, as well as the more common static networks. Renard pipelines are modular: users can choose the implementation of each… ▽ More Renard (Relationships Extraction from NARrative Documents) is a Python library that allows users to define custom natural language processing (NLP) pipelines to extract character networks from narrative texts. Contrary to the few existing tools, Renard can extract dynamic networks, as well as the more common static networks. Renard pipelines are modular: users can choose the implementation of each NLP subtask needed to extract a character network. This allows users to specialize pipelines to particular types of texts and to study the impact of each subtask on the extracted network. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted at JOSS

Journal ref: Journal of Open Source Software, 9(98), 6574 (2024)

arXiv:2407.01926 [pdf]

Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior

Authors: Chaoxing Huang, Ziqiang Yu, Zijian Gao, Qiuyi Shen, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

Abstract: This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results s… ▽ More This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results showed close alignment between measured and reference double bond values, with phantom experiments yielding a Pearson correlation coefficient of 0.96 (p = .0005). In vivo results demonstrated good agreement in subcutaneous fat. We conclude that Deep Image Prior shows feasibility for quantifying double bonds and fatty acid content from chemical-shift encoded multi-echo MRI. △ Less

Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01794 [pdf, other]

Conditionally valid Probabilistic Conformal Prediction

Authors: Vincent Plassier, Alexander Fishkov, Maxim Panov, Eric Moulines

Abstract: We develop a new method for creating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Most existing methods, such as conformalized quantile regression and probabilistic conformal prediction, only offer marginal coverage guarantees. Our approach extends these methods to achieve conditional coverage, which is essentia… ▽ More We develop a new method for creating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Most existing methods, such as conformalized quantile regression and probabilistic conformal prediction, only offer marginal coverage guarantees. Our approach extends these methods to achieve conditional coverage, which is essential for many practical applications. While exact conditional guarantees are impossible without assumptions on the data distribution, we provide non-asymptotic bounds that explicitly depend on the quality of the available estimate of the conditional distribution. Our confidence sets are highly adaptive to the local structure of the data, making them particularly useful in high heteroskedasticity situations. We demonstrate the effectiveness of our approach through extensive simulations, showing that it outperforms existing methods in terms of conditional coverage and improves the reliability of statistical inference in a wide range of applications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 23 pages

arXiv:2407.01758 [pdf, other]

Quantifying cascading power outages during climate extremes considering renewable energy integration

Authors: Luo Xu, Ning Lin, H. Vincent Poor, Dazhi Xi, A. T. D. Perera

Abstract: Climate extremes, such as hurricanes, combined with large-scale integration of environment-sensitive renewables, could exacerbate the risk of widespread power outages. We introduce a coupled climate-energy model for cascading power outages, which comprehensively captures the impacts of evolving climate extremes on renewable generation, and transmission and distribution networks. The model is valid… ▽ More Climate extremes, such as hurricanes, combined with large-scale integration of environment-sensitive renewables, could exacerbate the risk of widespread power outages. We introduce a coupled climate-energy model for cascading power outages, which comprehensively captures the impacts of evolving climate extremes on renewable generation, and transmission and distribution networks. The model is validated by the 2022 Puerto Rico catastrophic blackout during Hurricane Fiona, the first-ever system-wide blackout event with complete weather-induced outage records. The model presents a novel resilience pattern that was not captured by the present state-of-the-art models and reveals that early failure of certain critical components surprisingly enhances overall system resilience. Sensitivity analysis of various behind-the-meter solar integration scenarios demonstrates that lower integration levels (below 45%, including the current level) exhibit minimal impact on system resilience in this event. However, surpassing this critical level without additional flexibility resources can exacerbate the failure probability due to substantially enlarged energy imbalances. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 16 pages, 5 figures

arXiv:2407.01392 [pdf, other]

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Authors: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

Abstract: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of… ▽ More This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of next-token prediction models, such as variable-length generation, with the strengths of full-sequence diffusion models, such as the ability to guide sampling to desirable trajectories. Our method offers a range of additional capabilities, such as (1) rolling-out sequences of continuous tokens, such as video, with lengths past the training horizon, where baselines diverge and (2) new sampling and guiding schemes that uniquely profit from Diffusion Forcing's variable-horizon and causal architecture, and which lead to marked performance gains in decision-making and planning tasks. In addition to its empirical success, our method is proven to optimize a variational lower bound on the likelihoods of all subsequences of tokens drawn from the true joint distribution. Project website: https://boyuan.space/diffusion-forcing △ Less

Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: Project website: https://boyuan.space/diffusion-forcing Code: https://github.com/buoyancy99/diffusion-forcing

arXiv:2407.00783 [pdf, other]

Diffusion Models and Representation Learning: A Survey

Authors: Michael Fuest, **chuan Ma, Ming Gui, Johannes S. Fischer, Vincent Tao Hu, Bjorn Ommer

Abstract: Diffusion Models are popular generative modeling methods in various vision tasks, attracting significant attention. They can be considered a unique instance of self-supervised learning methods due to their independence from label annotation. This survey explores the interplay between diffusion models and representation learning. It provides an overview of diffusion models' essential aspects, inclu… ▽ More Diffusion Models are popular generative modeling methods in various vision tasks, attracting significant attention. They can be considered a unique instance of self-supervised learning methods due to their independence from label annotation. This survey explores the interplay between diffusion models and representation learning. It provides an overview of diffusion models' essential aspects, including mathematical foundations, popular denoising network architectures, and guidance methods. Various approaches related to diffusion models and representation learning are detailed. These include frameworks that leverage representations learned from pre-trained diffusion models for subsequent recognition tasks and methods that utilize advancements in representation and self-supervised learning to enhance diffusion models. This survey aims to offer a comprehensive overview of the taxonomy between diffusion models and representation learning, identifying key areas of existing concerns and potential exploration. Github link: https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Github Repo: https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy

arXiv:2407.00594 [pdf, other]

doi 10.1093/pasj/psae061

Major merger fraction along the massive galaxy quenching channel at 0.2$<z<$0.7

Authors: Shin Inoue, Kouji Ohta, Yoshihisa Asada, Marcin Sawicki, Guillaume Desprez, Stephen Gwyn, Vincent Picouet

Abstract: We study the major merger fraction along the massive galaxy quenching channel (traced with rest-frame $\mathrm{NUV}-r$ color) at $z=$ 0.2-0.7, aiming to examine the Cosmic Web Detachment (CWD) scenario of galaxy quenching. In this scenario, the major merger fraction is expected to be high in green valley galaxies as compared with those in star-forming and quiescent galaxies of similar stellar mass… ▽ More We study the major merger fraction along the massive galaxy quenching channel (traced with rest-frame $\mathrm{NUV}-r$ color) at $z=$ 0.2-0.7, aiming to examine the Cosmic Web Detachment (CWD) scenario of galaxy quenching. In this scenario, the major merger fraction is expected to be high in green valley galaxies as compared with those in star-forming and quiescent galaxies of similar stellar mass. We used photometry in the E-COSMOS field to select 1491 (2334) massive ($M_\ast>10^{9.5}$ $M_\odot$) galaxies with $m_i<22$ mag ($m_z<22$ mag) at $z=$ 0.2-0.4 ($z=$ 0.4-0.7) in the rest-frame color range of $0.8<r-K_s<1.3$. We define a major galaxy-galaxy merger as a galaxy pair of comparable angular size and luminosity with tidal tails or bridges, and we identified such major mergers through visual inspection of Subaru-HSC-SSP PDR 2 $i$- and $z$-band images. We classify 92 (123) galaxies as major merger galaxies at $z=$ 0.2-0.4 ($z=$ 0.4-0.7). The resulting major merger fraction is 5%-6% and this fraction does not change with galaxy color along the massive galaxy quenching channel. The result is not consistent with the expectation based of CWD scenario as the dominant mechanism of massive galaxy quenching. However, there are some caveats such as (i) the mergers that cause quenching may lose their visible merger signatures rapidly before they enter the Green Valley, (ii) our method may not trace the cosmic web sufficiently well, and (iii) because of our mass limit, most of the galaxies in our sample may have already experienced CWD events at higher redshifts than those studied here. Further studies with deeper data are desirable in the future. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 13 pages, 12 figures

arXiv:2407.00108 [pdf, other]

A Case Study on Contextual Machine Translation in a Professional Scenario of Subtitling

Authors: Sebastian Vincent, Charlotte Prescott, Chris Bayliss, Chris Oakley, Carolina Scarton

Abstract: Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work. However, the positive impact of such systems in industry remains unproven. We report on an industrial case study carried out to investigate the benefit of MT in a professional scenario of translating TV subtitles… ▽ More Incorporating extra-textual context such as film metadata into the machine translation (MT) pipeline can enhance translation quality, as indicated by automatic evaluation in recent work. However, the positive impact of such systems in industry remains unproven. We report on an industrial case study carried out to investigate the benefit of MT in a professional scenario of translating TV subtitles with a focus on how leveraging extra-textual context impacts post-editing. We found that post-editors marked significantly fewer context-related errors when correcting the outputs of MTCue, the context-aware model, as opposed to non-contextual models. We also present the results of a survey of the employed post-editors, which highlights contextual inadequacy as a significant gap consistently observed in MT. Our findings strengthen the motivation for further work within fully contextual MT. △ Less

Submitted 27 June, 2024; originally announced July 2024.

Comments: Accepted to EAMT 2024

arXiv:2406.19975 [pdf, other]

On the Response Entropy of APUFs

Authors: Vincent Dumoulin, Wen**g Rao, Natasha Devroye

Abstract: A Physically Unclonable Function (PUF) is a hardware security primitive used for authentication and key generation. It takes an input bit-vector challenge and produces a single-bit response, resulting in a challenge-response pair (CRP). The truth table of all challenge-response pairs of each manufactured PUF should look different due to inherent manufacturing randomness, forming a digital fingerpr… ▽ More A Physically Unclonable Function (PUF) is a hardware security primitive used for authentication and key generation. It takes an input bit-vector challenge and produces a single-bit response, resulting in a challenge-response pair (CRP). The truth table of all challenge-response pairs of each manufactured PUF should look different due to inherent manufacturing randomness, forming a digital fingerprint. A PUF's entropy (the entropy of all the responses, taken over the manufacturing randomness and uniformly selected challenges) has been studied before and is a challenging problem. Here we explore a related notion -- the response entropy, which is the entropy of an arbitrary response given knowledge of one (and two) other responses. This allows us to explore how knowledge of some CRP(s) impacts the ability to guess another response. The Arbiter PUF (APUF) is a well-known PUF architecture based on accumulated delay differences between two paths. In this paper, we obtain in closed form the probability mass function of any arbitrary response given knowledge of one or two other arbitrary CRPs for the APUF architecture. This allows us to obtain the conditional response entropy and then to define and obtain the size of the entropy bins (challenge sets with the same conditional response entropy) given knowledge of one or two CRPs. All of these results depend on the probability that two different challenge vectors yield the same response, termed the response similarity of those challenges. We obtain an explicit closed form expression for this. This probability depends on the statistical correlations induced by the PUF architecture together with the specific known and to-be-guessed challenges. As a by-product, we also obtain the optimal (minimizing probability of error) predictor of an unknown challenge given access to one (or two) challenges and the associated predictability. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19943 [pdf, other]

doi 10.59275/j.melba.2024-151b

Impact of Initialization on Intra-subject Pediatric Brain MR Image Registration: A Comparative Analysis between SyN ANTs and Deep Learning-Based Approaches

Authors: Andjela Dimitrijevic, Vincent Noblet, Benjamin De Leener

Abstract: This study evaluates the performance of conventional SyN ANTs and learning-based registration methods in the context of pediatric neuroimaging, specifically focusing on intrasubject deformable registration. The comparison involves three approaches: without (NR), with rigid (RR), and with rigid and affine (RAR) initializations. In addition to initialization, performances are evaluated in terms of a… ▽ More This study evaluates the performance of conventional SyN ANTs and learning-based registration methods in the context of pediatric neuroimaging, specifically focusing on intrasubject deformable registration. The comparison involves three approaches: without (NR), with rigid (RR), and with rigid and affine (RAR) initializations. In addition to initialization, performances are evaluated in terms of accuracy, speed, and the impact of age intervals and sex per pair. Data consists of the publicly available MRI scans from the Calgary Preschool dataset, which includes 63 children aged 2-7 years, allowing for 431 registration pairs. We implemented the unsupervised DL framework with a U-Net architecture using DeepReg and it was 5-fold cross-validated. Evaluation includes Dice scores for tissue segmentation from 18 smaller regions obtained by SynthSeg, analysis of log Jacobian determinants, and registration pro-rated training and inference times. Learning-based approaches, with or without linear initializations, exhibit slight superiority over SyN ANTs in terms of Dice scores. Indeed, DL-based implementations with RR and RAR initializations significantly outperform SyN ANTs. Both SyN ANTs and DL-based registration involve parameter optimization, but the choice between these methods depends on the scale of registration: network-based for broader coverage or SyN ANTs for specific structures. Both methods face challenges with larger age intervals due to greater growth changes. The main takeaway is that while DL-based methods show promise with faster and more accurate registrations, SyN ANTs remains robust and generalizable without the need for extensive training, highlighting the importance of method selection based on specific registration needs in the pediatric context. Our code is available at https://github.com/neuropoly/pediatric-DL-registration △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:013

Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

arXiv:2406.19751 [pdf, other]

Mixing of counterpropagating signals in a traveling-wave Josephson device

Authors: Matthieu Praquin, Vincent Lienhard, Anthony Giraudo, Aron Vanselow, Zaki Leghtas, Philippe Campagne-Ibarcq

Abstract: Light waves do not interact in vacuum, but may mix through various parametric processes when traveling in a nonlinear medium. In particular, a high-amplitude wave can be leveraged to frequency convert a low-amplitude signal, as long as the overall energy and momentum of interacting photons are conserved. These conditions are typically met when all waves propagate in the medium with identical phase… ▽ More Light waves do not interact in vacuum, but may mix through various parametric processes when traveling in a nonlinear medium. In particular, a high-amplitude wave can be leveraged to frequency convert a low-amplitude signal, as long as the overall energy and momentum of interacting photons are conserved. These conditions are typically met when all waves propagate in the medium with identical phase velocity along a particular axis. In this work, we investigate an alternative scheme by which an input microwave signal propagating along a 1-dimensional Josephson metamaterial is converted to an output wave propagating in the opposite direction. The interaction is mediated by a pump wave propagating at low phase velocity. In this novel regime, the input signal is exponentially attenuated as it travels down the device. We exploit this process to implement a robust on-chip microwave isolator that can be reconfigured into a reciprocal and tunable coupler. The device mode of operation is selected in situ, along with its working frequency over a wide microwave range. In the 5.5-8.5 GHz range, we measure an isolation over 15 dB on a typical bandwidth of 100 MHz, on par with the best existing on-chip isolators. Substantial margin for improvement exists through design optimization and by reducing fabrication disorder, opening new avenues for microwave routing and processing in superconducting circuits. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19428 [pdf, other]

Systems design, assembly, integration and lab testing of WALOP-South Polarimeter

Authors: Siddharth Maharana, A. N. Ramaprakash, Chaitanya Rajarshi, Pravin Khodade, Bhushan Joshi, Pravin Chordia, Abhay Kohok, Ramya M. Anche, Deepa Modi, John A. Kypriotakis, Amit Deokar, Aditya Kinjawadekar, Stephen B. Potter, Dmitry Blinov, Hans Kristian Eriksen, Myrto Falalaki, Hitesh Gajjar, Tuhin Ghosh, Eirik Gjerløw, Sebastain Kiehlmann, Ioannis Liodakis, Nikolaos Mandarakas, Georgia V. Panopoulou, Vasiliki Pavlidou, Timothy J. Pearson , et al. (6 additional authors not shown)

Abstract: Wide-Area Linear Optical Polarimeter (WALOP)-South is the first wide-field and survey-capacity polarimeter in the optical wavelengths. On schedule for commissioning in 2024, it will be mounted on the 1 m SAAO telescope in Sutherland Observatory, South Africa to undertake the PASIPHAE sky survey. PASIPHAE program will create the first polarimetric sky map in the optical wavelengths, spanning more t… ▽ More Wide-Area Linear Optical Polarimeter (WALOP)-South is the first wide-field and survey-capacity polarimeter in the optical wavelengths. On schedule for commissioning in 2024, it will be mounted on the 1 m SAAO telescope in Sutherland Observatory, South Africa to undertake the PASIPHAE sky survey. PASIPHAE program will create the first polarimetric sky map in the optical wavelengths, spanning more than 2000 square degrees of the southern Galactic region. The innovative design of WALOP-South will enable it to measure the linear polarization (Stokes parameters $q$ and $u$), in a single exposure, of all sources in a field of view (FoV) of $35\times35$ arcminutes-squared in the SDSS-r broadband and narrowband filters between 500-750 nm with 0.1 % polarization accuracy. The unique goals of the instrument place very stringent systems engineering goals, including on the performance of the optical, polarimetric, optomechanical, and electronic subsystems. All the subsystems have been designed carefully to meet the overall instrument performance goals. As of May 2024, all the instrument optical and mechanical subsystems have been assembled and are currently getting tested and integrated. The complete testing and characterization of the instrument in the lab is expected to be completed by August 2024. In this paper, we will present (a) the design and development of the entire instrument and its major subsystems, focusing on the opto-mechanical design which has not been reported before, and (b) assembly and integration of the instrument in the lab and early results from lab characterization of the instrument. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 20 pages; presented at 2024 SPIE Astronomical Telescopes + Instrumentation conference

arXiv:2406.19320 [pdf, other]

Efficient World Models with Context-Aware Tokenization

Authors: Vincent Micheli, Eloi Alonso, François Fleuret

Abstract: Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environme… ▽ More Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose $Δ$-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, $Δ$-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2406.18991 [pdf, other]

doi 10.1051/0004-6361/202450587

Resonant sub-Neptunes are puffier

Authors: Adrien Leleu, Jean-Baptiste Delisle, Remo Burn, André Izidoro, Stéphane Udry, Xavier Dumusque, Christophe Lovis, Sarah Millholland, Léna Parc, François Bouchy, Vincent Bourrier, Yann Alibert, João Faria, Christoph Mordasini, Damien Ségransan

Abstract: A systematic, population-level discrepancy exists between the densities of exoplanets whose masses have been measured with transit timing variations (TTVs) versus those measured with radial velocities (RVs). Since the TTV planets are predominantly nearly resonant, it is still unclear whether the discrepancy is attributed to detection biases or to astrophysical differences between the nearly resona… ▽ More A systematic, population-level discrepancy exists between the densities of exoplanets whose masses have been measured with transit timing variations (TTVs) versus those measured with radial velocities (RVs). Since the TTV planets are predominantly nearly resonant, it is still unclear whether the discrepancy is attributed to detection biases or to astrophysical differences between the nearly resonant and non resonant planet populations. We defined a controlled, unbiased sample of 36 sub-Neptunes characterised by Kepler, TESS, HARPS, and ESPRESSO. We found that their density depends mostly on the resonant state of the system, with a low probability (of $0.002_{-0.001}^{+0.010}$) that the mass of (nearly) resonant planets is drawn from the same underlying population as the bulk of sub-Neptunes. Increasing the sample to 133 sub-Neptunes reveals finer details: the densities of resonant planets are similar and lower than non-resonant planets, and both the mean and spread in density increase for planets that are away from resonance. This trend is also present in RV-characterised planets alone. In addition, TTVs and RVs have consistent density distributions for a given distance to resonance. We also show that systems closer to resonances tend to be more co-planar than their spread-out counterparts. These observational trends are also found in synthetic populations, where planets that survived in their original resonant configuration retain a lower density; whereas less compact systems have undergone post-disc giant collisions that increased the planet's density, while expanding their orbits. Our findings reinforce the claim that resonant systems are archetypes of planetary systems at their birth. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18671 [pdf, other]

A Zero Auxiliary Knowledge Membership Inference Attack on Aggregate Location Data

Authors: Vincent Guan, Florent Guépin, Ana-Maria Cretu, Yves-Alexandre de Montjoye

Abstract: Location data is frequently collected from populations and shared in aggregate form to guide policy and decision making. However, the prevalence of aggregated data also raises the privacy concern of membership inference attacks (MIAs). MIAs infer whether an individual's data contributed to the aggregate release. Although effective MIAs have been developed for aggregate location data, these require… ▽ More Location data is frequently collected from populations and shared in aggregate form to guide policy and decision making. However, the prevalence of aggregated data also raises the privacy concern of membership inference attacks (MIAs). MIAs infer whether an individual's data contributed to the aggregate release. Although effective MIAs have been developed for aggregate location data, these require access to an extensive auxiliary dataset of individual traces over the same locations, which are collected from a similar population. This assumption is often impractical given common privacy practices surrounding location data. To measure the risk of an MIA performed by a realistic adversary, we develop the first Zero Auxiliary Knowledge (ZK) MIA on aggregate location data, which eliminates the need for an auxiliary dataset of real individual traces. Instead, we develop a novel synthetic approach, such that suitable synthetic traces are generated from the released aggregate. We also develop methods to correct for bias and noise, to show that our synthetic-based attack is still applicable when privacy mechanisms are applied prior to release. Using two large-scale location datasets, we demonstrate that our ZK MIA matches the state-of-the-art Knock-Knock (KK) MIA across a wide range of settings, including popular implementations of differential privacy (DP) and suppression of small counts. Furthermore, we show that ZK MIA remains highly effective even when the adversary only knows a small fraction (10%) of their target's location history. This demonstrates that effective MIAs can be performed by realistic adversaries, highlighting the need for strong DP protection. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: To be published in PETS 2024

arXiv:2406.18665 [pdf, other]

RouteLLM: Learning to Route LLMs with Preference Data

Authors: Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica

Abstract: Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select betwe… ▽ More Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs. △ Less

Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18631 [pdf, other]

HATS-38 b and WASP-139 b join a growing group of eccentric hot Neptunes on polar orbits

Authors: Juan I. Espinoza-Retamal, Guðmundur Stefánsson, Cristobal Petrovich, Rafael Brahm, Andrés Jordán, Elyar Sedaghati, Jennifer P. Lucero, Marcelo Tala Pinto, Diego J. Muñoz, Gavin Boyle, Rodrigo Leiva, Vincent Suc

Abstract: We constrain the sky-projected obliquities of two low-density hot Neptune planets HATS-38 b and WASP-139 b orbiting nearby G and K stars using Rossiter-McLaughlin (RM) observations with VLT/ESPRESSO, yielding $λ= -108_{-16}^{+11}$ deg and $-85.6_{-4.2}^{+7.7}$ deg, respectively. To model the RM effect, we use a new publicly available code, \texttt{ironman}, which is capable of jointly fitting tran… ▽ More We constrain the sky-projected obliquities of two low-density hot Neptune planets HATS-38 b and WASP-139 b orbiting nearby G and K stars using Rossiter-McLaughlin (RM) observations with VLT/ESPRESSO, yielding $λ= -108_{-16}^{+11}$ deg and $-85.6_{-4.2}^{+7.7}$ deg, respectively. To model the RM effect, we use a new publicly available code, \texttt{ironman}, which is capable of jointly fitting transit photometry, Keplerian radial velocities, and RM effects. The two planets have residual eccentricities ($e= 0.112_{-0.070}^{+0.072}$, and $0.103_{-0.041}^{+0.050}$, respectively), and together with the obliquity constraints, we show that they join a growing group of eccentric hot and low-density Neptunes on polar orbits. We use long-term radial velocities to rule out companions with masses $\sim 0.3-50$ $M_J$ within $\sim$10 au. We show that the orbital architectures of the two Neptunes disfavor an origin from primordial disk misalignment and/or in-situ secular interactions with distant companions and instead favor high-eccentricity migration from $\gtrsim 2$ au driven by a distant companion. Finally, we performed a hierarchical Bayesian modeling of the true obliquity distribution of Neptunes and found suggestive evidence for a higher preponderance of polar orbits of hot Neptunes compared to Jupiters. However, we note that the exact distribution is sensitive to the choice of priors, highlighting the need for additional obliquity measurements of Neptunes to robustly compare the hot Neptune obliquity distribution to Jupiters. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures. Submitted to AAS Journals

Showing 1–50 of 10,265 results for author: Vincent