-
Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric Models and the MDL Principle
Authors:
Sébastien Ferré
Abstract:
The Abstraction and Reasoning Corpus (ARC) is a challenging benchmark, introduced to foster AI research towards human-level intelligence. It is a collection of unique tasks about generating colored grids, specified by a few examples only. In contrast to the transformation-based programs of existing work, we introduce object-centric models that are in line with the natural programs produced by huma…
▽ More
The Abstraction and Reasoning Corpus (ARC) is a challenging benchmark, introduced to foster AI research towards human-level intelligence. It is a collection of unique tasks about generating colored grids, specified by a few examples only. In contrast to the transformation-based programs of existing work, we introduce object-centric models that are in line with the natural programs produced by humans. Our models can not only perform predictions, but also provide joint descriptions for input/output pairs. The Minimum Description Length (MDL) principle is used to efficiently search the large model space. A diverse range of tasks are solved, and the learned models are similar to the natural programs. We demonstrate the generality of our approach by applying it to a different domain.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
KG-MDL: Mining Graph Patterns in Knowledge Graphs with the MDL Principle
Authors:
Francesco Bariatti,
Peggy Cellier,
Sébastien Ferré
Abstract:
Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to inter…
▽ More
Nowadays, increasingly more data are available as knowledge graphs (KGs). While this data model supports advanced reasoning and querying, they remain difficult to mine due to their size and complexity. Graph mining approaches can be used to extract patterns from KGs. However this presents two main issues. First, graph mining approaches tend to extract too many patterns for a human analyst to interpret (pattern explosion). Second, real-life KGs tend to differ from the graphs usually treated in graph mining: they are multigraphs, their vertex degrees tend to follow a power-law, and the way in which they model knowledge can produce spurious patterns. Recently, a graph mining approach named GraphMDL+ has been proposed to tackle the problem of pattern explosion, using the Minimum Description Length (MDL) principle. However, GraphMDL+, like other graph mining approaches, is not suited for KGs without adaptations. In this paper we propose KG-MDL, a graph pattern mining approach based on the MDL principle that, given a KG, generates a human-sized and descriptive set of graph patterns, and so in a parameter-less and anytime way. We report on experiments on medium-sized KGs showing that our approach generates sets of patterns that are both small enough to be interpreted by humans and descriptive of the KG. We show that the extracted patterns highlight relevant characteristics of the data: both of the schema used to create the data, and of the concrete facts it contains. We also discuss the issues related to mining graph patterns on knowledge graphs, as opposed to other types of graph data.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Home-to-school pedestrian mobility GPS data from a citizen science experiment in the Barcelona area
Authors:
Ferran Larroya,
Ofelia Díaz,
Oleguer Segarra,
Pol Colomer Simón,
Salva Ferré,
Esteban Moro,
Josep Perelló
Abstract:
The analysis of pedestrian GPS datasets is fundamental to further advance on the study and the design of walkable cities. The highest resolution GPS data can characterize micro-mobility patterns and pedestrians' micro-motives in relation to a small-scale urban context. Purposed-based recurrent mobility data inside people's neighborhoods is an important source in these sorts of studies. However, mi…
▽ More
The analysis of pedestrian GPS datasets is fundamental to further advance on the study and the design of walkable cities. The highest resolution GPS data can characterize micro-mobility patterns and pedestrians' micro-motives in relation to a small-scale urban context. Purposed-based recurrent mobility data inside people's neighborhoods is an important source in these sorts of studies. However, micro-mobility around people's homes is generally unavailable, and if data exists, it is generally not shareable often due to privacy issues. Citizen science and its public involvement practices in scientific research are valid options to circumvent these challenges and provide meaningful datasets for walkable cities. The study presents GPS records from single-day home-to-school pedestrian mobility of 10 schools in the Barcelona Metropolitan area (Spain). The research provides pedestrian mobility from an age-homogeneous group of people. The study shares processed records with specific filtering, cleaning, and interpolation procedures that can facilitate and accelerate data usage. Citizen science practices during the whole research process are reported to offer a complete perspective of the data collected.
△ Less
Submitted 23 May, 2023; v1 submitted 15 February, 2023;
originally announced February 2023.
-
First Steps of an Approach to the ARC Challenge based on Descriptive Grid Models and the Minimum Description Length Principle
Authors:
Sébastien Ferré
Abstract:
The Abstraction and Reasoning Corpus (ARC) was recently introduced by François Chollet as a tool to measure broad intelligence in both humans and machines. It is very challenging, and the best approach in a Kaggle competition could only solve 20% of the tasks, relying on brute-force search for chains of hand-crafted transformations. In this paper, we present the first steps exploring an approach b…
▽ More
The Abstraction and Reasoning Corpus (ARC) was recently introduced by François Chollet as a tool to measure broad intelligence in both humans and machines. It is very challenging, and the best approach in a Kaggle competition could only solve 20% of the tasks, relying on brute-force search for chains of hand-crafted transformations. In this paper, we present the first steps exploring an approach based on descriptive grid models and the Minimum Description Length (MDL) principle. The grid models describe the contents of a grid, and support both parsing grids and generating grids. The MDL principle is used to guide the search for good models, i.e. models that compress the grids the most. We report on our progress over a year, improving on the general approach and the models. Out of the 400 training tasks, our performance increased from 5 to 29 solved tasks, only using 30s computation time per task. Our approach not only predicts the output grids, but also outputs an intelligible model and explanations for how the model was incrementally built.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Infrared narrow band gap nanocrystals: recent progresses relative to imaging and active detection
Authors:
Charlie Greboval,
Simon Ferre,
Vincent Noguier,
Audrey Chu,
Junling Qu,
Sang-Soo Chee,
Gregory Vincent,
Emmanuel Lhuillier
Abstract:
Current technologies for infrared detection have been based on epitaxially grown semiconductors. Here we review some of the recent developments relative to colloidal nanocrystals and their use as building blocks for the design of low-cost infrared sensors. We focus on HgTe nanocrystals which appear as the only material leading to infrared photoconductivity and ultra-broad spectral tunability: from…
▽ More
Current technologies for infrared detection have been based on epitaxially grown semiconductors. Here we review some of the recent developments relative to colloidal nanocrystals and their use as building blocks for the design of low-cost infrared sensors. We focus on HgTe nanocrystals which appear as the only material leading to infrared photoconductivity and ultra-broad spectral tunability: from the visible to the long-wave infrared. We review some of the important results which demonstrated that colloidal nanocrystals can be compatible with air stable operations, fast detection, and strong absorption. We discuss the recent progresses relative to multipixel devices and show results obtained by coupling short-wave infrared nanoparticles with CMOS circuits to achieve video rate VGA format imaging. In particular we present that nanocrystals are a promising material for long range (>150 m) active detection in both continuous wave and pulsed mode with a time resolution down to 10 ns.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.