-
Learning Structure-Aware Representations of Dependent Types
Authors:
Konstantinos Kogkalidis,
Orestis Melkonian,
Jean-Philippe Bernardy
Abstract:
Agda is a dependently-typed programming language and a proof assistant, pivotal in proof formalization and programming language theory. This paper extends the Agda ecosystem into machine learning territory, and, vice versa, makes Agda-related resources available to machine learning practitioners. We introduce and release a novel dataset of Agda program-proofs that is elaborate and extensive enough…
▽ More
Agda is a dependently-typed programming language and a proof assistant, pivotal in proof formalization and programming language theory. This paper extends the Agda ecosystem into machine learning territory, and, vice versa, makes Agda-related resources available to machine learning practitioners. We introduce and release a novel dataset of Agda program-proofs that is elaborate and extensive enough to support various machine learning applications -- the first of its kind. Leveraging the dataset's ultra-high resolution, detailing proof states at the sub-type level, we propose a novel neural architecture targeted at faithfully representing dependently-typed programs on the basis of structural rather than nominal principles. We instantiate and evaluate our architecture in a premise selection setup, where it achieves strong initial results.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Algebraic Positional Encodings
Authors:
Konstantinos Kogkalidis,
Jean-Philippe Bernardy,
Vikas Garg
Abstract:
We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible map** from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds the desired st…
▽ More
We introduce a novel positional encoding strategy for Transformer-style models, addressing the shortcomings of existing, often ad hoc, approaches. Our framework provides a flexible map** from the algebraic specification of a domain to an interpretation as orthogonal operators. This design preserves the algebraic characteristics of the source domain, ensuring that the model upholds the desired structural properties. Our scheme can accommodate various structures, including sequences, grids and trees, as well as their compositions. We conduct a series of experiments to demonstrate the practical applicability of our approach. Results suggest performance on par with or surpassing the current state-of-the-art, without hyperparameter optimizations or ``task search'' of any kind. Code will be made available at \url{github.com/konstantinosKokos/UnitaryPE}.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Domain-Specific Tensor Languages
Authors:
Jean-Philippe Bernardy,
Patrik Jansson
Abstract:
The tensor notation used in several areas of mathematics is a useful one, but it is not widely available to the functional programming community. In a practical sense, the (embedded) domain-specific languages (DSLs) that are currently in use for tensor algebra are either 1. array-oriented languages that do not enforce or take advantage of tensor properties and algebraic structure or 2. follow the…
▽ More
The tensor notation used in several areas of mathematics is a useful one, but it is not widely available to the functional programming community. In a practical sense, the (embedded) domain-specific languages (DSLs) that are currently in use for tensor algebra are either 1. array-oriented languages that do not enforce or take advantage of tensor properties and algebraic structure or 2. follow the categorical structure of tensors but require the programmer to manipulate tensors in an unwieldy point-free notation. A deeper issue is that for tensor calculus, the dominant pedagogical paradigm assumes an audience which is either comfortable with notational liberties which programmers cannot afford, or focus on the applied mathematics of tensors, largely leaving their linguistic aspects (behaviour of variable binding, syntax and semantics, etc.) for the reader to figure out by themselves. This state of affairs is hardly surprising, because, as we highlight, several properties of standard tensor notation are somewhat exotic from the perspective of lambda calculi. We bridge the gap by defining a DSL, embedded in Haskell, whose syntax closely captures the index notation for tensors in wide use in the literature. The semantics of this EDSL is defined in terms of the algebraic structures which define tensors in their full generality. This way, we believe that our EDSL can be used both as a tool for scientific computing, but also as a vehicle to express and present the theory and applications of tensors.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring
Authors:
Juan Sebastián Cañas,
Maria Paula Toro-Gómez,
Larissa Sayuri Moreira Sugai,
Hernán Darío Benítez Restrepo,
Jorge Rudas,
Breyner Posso Bautista,
Luís Felipe Toledo,
Simone Dena,
Adão Henrique Rosa Domingos,
Franco Leandro de Souza,
Selvino Neckel-Oliveira,
Anderson da Rosa,
Vítor Carvalho-Rocha,
José Vinícius Bernardy,
José Luiz Massao Moreira Sugai,
Carolina Emília dos Santos,
Rogério Pereira Bastos,
Diego Llusia,
Juan Sebastián Ulloa
Abstract:
Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians ca…
▽ More
Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Assessing the Unitary RNN as an End-to-End Compositional Model of Syntax
Authors:
Jean-Philippe Bernardy,
Shalom Lappin
Abstract:
We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns: context-free long distance agreement, and mildly context-sensitive cross serial dependencies. This work extends recent experiments on deeply nested context-free long distance dependencies, with similar results. URNs differ from LSTMs in that they avo…
▽ More
We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns: context-free long distance agreement, and mildly context-sensitive cross serial dependencies. This work extends recent experiments on deeply nested context-free long distance dependencies, with similar results. URNs differ from LSTMs in that they avoid non-linear activation functions, and they apply matrix multiplication to word embeddings encoded as unitary matrices. This permits them to retain all information in the processing of an input string over arbitrary distances. It also causes them to satisfy strict compositionality. URNs constitute a significant advance in the search for explainable models in deep learning applied to NLP.
△ Less
Submitted 11 August, 2022;
originally announced August 2022.
-
UniMorph 4.0: Universal Morphology
Authors:
Khuyagbaatar Batsuren,
Omer Goldman,
Salam Khalifa,
Nizar Habash,
Witold Kieraś,
Gábor Bella,
Brian Leonard,
Garrett Nicolai,
Kyle Gorman,
Yustinus Ghanggo Ate,
Maria Ryskina,
Sabrina J. Mielke,
Elena Budianskaya,
Charbel El-Khaissi,
Tiago Pimentel,
Michael Gasser,
William Lane,
Mohit Raj,
Matt Coler,
Jaime Rafael Montoya Samame,
Delio Siticonatzi Camaiteri,
Benoît Sagot,
Esaú Zumaeta Rojas,
Didier López Francis,
Arturo Oncevay
, et al. (71 additional authors not shown)
Abstract:
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa…
▽ More
The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet.
△ Less
Submitted 19 June, 2022; v1 submitted 7 May, 2022;
originally announced May 2022.
-
Modeling VI and VDRL feedback functions: searching normative rules through computational simulation
Authors:
Paulo Sergio Panse Silveira,
Jose de Oliveira Siqueira,
Joao Lucas Bernardy,
Jessica Santiago,
Thiago Cersosimo Meneses,
Bianca Sanches Portela,
Marcelo Frota Benvenuti
Abstract:
In this paper, we present a R script named Beak, built to simulate rates of behavior interacting with schedules of reinforcement. Using Beak, we've simulated data that allows an assessment of different reinforcement feedback functions (RFF). This was made with unparalleled precision, since simulations provide huge samples of data and, more importantly, simulated behavior isn't changed by the reinf…
▽ More
In this paper, we present a R script named Beak, built to simulate rates of behavior interacting with schedules of reinforcement. Using Beak, we've simulated data that allows an assessment of different reinforcement feedback functions (RFF). This was made with unparalleled precision, since simulations provide huge samples of data and, more importantly, simulated behavior isn't changed by the reinforcement it produces. Therefore, we can vary it systematically. We've compared different RFF for RI schedules, using as criteria: meaning, precision, parsimony and generality. Our results indicate that the best feedback function for the RI schedule was published by Baum (1981). We also propose that the model used by Killeen (1975) is a viable feedback function for the RDRL schedule. We argue that Beak paves the way for greater understanding of schedules of reinforcement, addressing still open questions about quantitative features of schedules. Also, they could guide future experiments that use schedules as theoretical and methodological tools.
△ Less
Submitted 7 July, 2022; v1 submitted 27 November, 2021;
originally announced November 2021.
-
Evaluating Linear Functions to Symmetric Monoidal Categories
Authors:
Jean-Philippe Bernardy,
Arnaud Spiwack
Abstract:
A number of domain specific languages, such as circuits or data-science workflows, are best expressed as diagrams of boxes connected by wires. Unfortunately, functional languages have traditionally been ill-equipped to embed this sort of languages. The Arrow abstraction is an approximation, but we argue that it does not capture the right properties.
A faithful abstraction is Symmetric Monoidal C…
▽ More
A number of domain specific languages, such as circuits or data-science workflows, are best expressed as diagrams of boxes connected by wires. Unfortunately, functional languages have traditionally been ill-equipped to embed this sort of languages. The Arrow abstraction is an approximation, but we argue that it does not capture the right properties.
A faithful abstraction is Symmetric Monoidal Categories (SMCs), but,so far,it hasn't been convenient to use. We show how the advent of linear ty** in Haskell lets us bridge this gap. We provide a library which lets us program in SMCs with linear functions instead of SMC combinators. This considerably lowers the syntactic overhead of the EDSL to be on par with that of monadic DSLs. A remarkable feature of our library is that, contrary to previously known methods for categories, it does not use any metaprogramming.
△ Less
Submitted 22 July, 2021; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Linearly Qualified Types: Generic inference for capabilities and uniqueness
Authors:
Arnaud Spiwack,
Csongor Kiss,
Jean-Philippe Bernardy,
Nicolas Wu,
Richard Eisenberg
Abstract:
A linear parameter must be consumed exactly once in the body of its function. When declaring resources such as file handles and manually managed memory as linear arguments, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. This paper presents linear constraints, a front-end feature for linear ty** that…
▽ More
A linear parameter must be consumed exactly once in the body of its function. When declaring resources such as file handles and manually managed memory as linear arguments, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. This paper presents linear constraints, a front-end feature for linear ty** that decreases the bureaucracy of working with linear types. Linear constraints are implicit linear arguments that are filled in automatically by the compiler. We present linear constraints as a qualified type system,together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell.
△ Less
Submitted 22 July, 2022; v1 submitted 10 March, 2021;
originally announced March 2021.
-
FraCaS: Temporal Analysis
Authors:
Jean-Philippe Bernardy,
Stergios Chatzikyriakidis
Abstract:
In this paper, we propose an implementation of temporal semantics which is suitable for inference problems. This implementation translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. We support several phenomena including: temporal references, temporal adverbs, aspectual classes and progressives. We apply these semantics to the complete FraCaS testsuite.…
▽ More
In this paper, we propose an implementation of temporal semantics which is suitable for inference problems. This implementation translates syntax trees to logical formulas, suitable for consumption by the Coq proof assistant. We support several phenomena including: temporal references, temporal adverbs, aspectual classes and progressives. We apply these semantics to the complete FraCaS testsuite. We obtain an accuracy of 81 percent overall and 73 percent for problems explicitly marked as related to temporal reference.
△ Less
Submitted 19 December, 2020;
originally announced December 2020.
-
Dynamic IFC Theorems for Free!
Authors:
Maximilian Algehed,
Jean-Philippe Bernardy,
Catalin Hritcu
Abstract:
We show that noninterference and transparency, the key soundness theorems for dynamic IFC libraries, can be obtained "for free", as direct consequences of the more general parametricity theorem of type abstraction. This allows us to give very short soundness proofs for dynamic IFC libraries such as faceted values and LIO. Our proofs stay short even when fully mechanized for Agda implementations of…
▽ More
We show that noninterference and transparency, the key soundness theorems for dynamic IFC libraries, can be obtained "for free", as direct consequences of the more general parametricity theorem of type abstraction. This allows us to give very short soundness proofs for dynamic IFC libraries such as faceted values and LIO. Our proofs stay short even when fully mechanized for Agda implementations of the libraries in terms of type abstraction.
△ Less
Submitted 22 February, 2024; v1 submitted 10 May, 2020;
originally announced May 2020.
-
A corpus of precise natural textual entailment problems
Authors:
Jean-Philippe Bernardy,
Stergios Chatzikyriakidis
Abstract:
In this paper, we present a new corpus of entailment problems. This corpus combines the following characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on "real-world" texts (i.e. most of the premises were written for purposes other than testing textual entailment). 3. its size is 150. The corpus was constructed by taking problems from the Real Text Entailment a…
▽ More
In this paper, we present a new corpus of entailment problems. This corpus combines the following characteristics: 1. it is precise (does not leave out implicit hypotheses) 2. it is based on "real-world" texts (i.e. most of the premises were written for purposes other than testing textual entailment). 3. its size is 150. The corpus was constructed by taking problems from the Real Text Entailment and discovering missing hypotheses using a crowd of experts. We believe that this corpus constitutes a first step towards wide-coverage testing of precise natural-language inference systems.
△ Less
Submitted 14 December, 2018;
originally announced December 2018.
-
Linear Haskell: practical linearity in a higher-order polymorphic language
Authors:
Jean-Philippe Bernardy,
Mathieu Boespflug,
Ryan R. Newton,
Simon Peyton Jones,
Arnaud Spiwack
Abstract:
Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional…
▽ More
Linear type systems have a long and storied history, but not a clear path forward to integrate with existing languages such as OCaml or Haskell. In this paper, we study a linear type system designed with two crucial properties in mind: backwards-compatibility and code reuse across linear and non-linear users of a library. Only then can the benefits of linear types permeate conventional functional programming. Rather than bifurcate types into linear and non-linear counterparts, we instead attach linearity to function arrows. Linear functions can receive inputs from linearly-bound values, but can also operate over unrestricted, regular values.
To demonstrate the efficacy of our linear type system - both how easy it can be integrated in an existing language implementation and how streamlined it makes it to write programs with linear types - we implemented our type system in GHC, the leading Haskell compiler, and demonstrate two kinds of applications of linear types: mutable data with pure interfaces; and enforcing protocols in I/O-performing functions.
△ Less
Submitted 8 November, 2017; v1 submitted 26 October, 2017;
originally announced October 2017.
-
Modelling prosodic structure using Artificial Neural Networks
Authors:
Jean-Philippe Bernardy,
Charalambos Themistocleous
Abstract:
The ability to accurately perceive whether a speaker is asking a question or is making a statement is crucial for any successful interaction. However, learning and classifying tonal patterns has been a challenging task for automatic speech recognition and for models of tonal representation, as tonal contours are characterized by significant variation. This paper provides a classification model of…
▽ More
The ability to accurately perceive whether a speaker is asking a question or is making a statement is crucial for any successful interaction. However, learning and classifying tonal patterns has been a challenging task for automatic speech recognition and for models of tonal representation, as tonal contours are characterized by significant variation. This paper provides a classification model of Cypriot Greek questions and statements. We evaluate two state-of-the-art network architectures: a Long Short-Term Memory (LSTM) network and a convolutional network (ConvNet). The ConvNet outperforms the LSTM in the classification task and exhibited an excellent performance with 95% classification accuracy.
△ Less
Submitted 15 June, 2017; v1 submitted 13 June, 2017;
originally announced June 2017.
-
Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda
Authors:
Jean-Philippe Bernardy,
Patrik Jansson
Abstract:
Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of th…
▽ More
Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system.
△ Less
Submitted 12 June, 2016; v1 submitted 28 January, 2016;
originally announced January 2016.