-
From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice
Authors:
Domenico Amato,
Giosué Lo Bosco,
Raffaele Giancarlo
Abstract:
This research concerns Learned Data Structures, a recent area that has emerged at the crossroad of Machine Learning and Classic Data Structures. It is methodologically important and with a high practical impact. We focus on Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available so far are specific in the sense that they can boost, indeed impressively, the time performance…
▽ More
This research concerns Learned Data Structures, a recent area that has emerged at the crossroad of Machine Learning and Classic Data Structures. It is methodologically important and with a high practical impact. We focus on Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available so far are specific in the sense that they can boost, indeed impressively, the time performance of Table Search Procedures with a sorted layout only, e.g., Binary Search. We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary, for instance, Balanced Binary Search Trees or Binary Search on layouts other that sorted, i.e., Eytzinger. Theoretically, based on it, we obtain several results of interest, such as (a) the first Learned Optimum Binary Search Forest, with mean access time bounded by the Entropy of the probability distribution of the accesses to the Dictionary; (b) the first Learned Sorted Set Dictionary that, in the Dynamic Case and in an amortized analysis setting, matches the same time bounds known for Classic Dictionaries. This latter under widely accepted assumptions regarding the size of the Universe. The experimental part, somewhat complex in terms of software development, clearly indicates the nonobvious finding that the generalization we propose can yield effective and competitive Learned Data Structural Booster, even with respect to specific benchmark models.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Automatic and effective discovery of quantum kernels
Authors:
Massimiliano Incudini,
Daniele Lizzio Bosco,
Francesco Martini,
Michele Grossi,
Giuseppe Serra,
Alessandra Di Pierro
Abstract:
Quantum computing can empower machine learning models by enabling kernel machines to leverage quantum kernels for representing similarity measures between data. Quantum kernels are able to capture relationships in the data that are not efficiently computable on classical devices. However, there is no straightforward method to engineer the optimal quantum kernel for each specific use case. While re…
▽ More
Quantum computing can empower machine learning models by enabling kernel machines to leverage quantum kernels for representing similarity measures between data. Quantum kernels are able to capture relationships in the data that are not efficiently computable on classical devices. However, there is no straightforward method to engineer the optimal quantum kernel for each specific use case. While recent literature has focused on exploiting the potential offered by the presence of symmetries in the data to guide the construction of quantum kernels, we adopt here a different approach, which employs optimization techniques, similar to those used in neural architecture search and AutoML, to automatically find an optimal kernel in a heuristic manner. The algorithm we present constructs a quantum circuit implementing the similarity measure as a combinatorial object, which is evaluated based on a cost function and is then iteratively modified using a meta-heuristic optimization technique. The cost function can encode many criteria ensuring favorable statistical properties of the candidate solution, such as the rank of the Dynamical Lie Algebra. Importantly, our approach is independent of the optimization technique employed. The results obtained by testing our approach on a high-energy physics problem demonstrate that, in the best-case scenario, we can either match or improve testing accuracy with respect to the manual design approach, showing the potential of our technique to deliver superior results with reduced effort.
△ Less
Submitted 20 December, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
On the Suitability of Neural Networks as Building Blocks for The Design of Efficient Learned Indexes
Authors:
Domenico Amato,
Giosue' Lo Bosco,
Raffaele Giancarlo
Abstract:
With the aim of obtaining time/space improvements in classic Data Structures, an emerging trend is to combine Machine Learning techniques with the ones proper of Data Structures. This new area goes under the name of Learned Data Structures. The motivation for its study is a perceived change of paradigm in Computer Architectures that would favour the use of Graphics Processing Units and Tensor Proc…
▽ More
With the aim of obtaining time/space improvements in classic Data Structures, an emerging trend is to combine Machine Learning techniques with the ones proper of Data Structures. This new area goes under the name of Learned Data Structures. The motivation for its study is a perceived change of paradigm in Computer Architectures that would favour the use of Graphics Processing Units and Tensor Processing Units over conventional Central Processing Units. In turn, that would favour the use of Neural Networks as building blocks of Classic Data Structures. Indeed, Learned Bloom Filters, which are one of the main pillars of Learned Data Structures, make extensive use of Neural Networks to improve the performance of classic Filters. However, no use of Neural Networks is reported in the realm of Learned Indexes, which is another main pillar of that new area. In this contribution, we provide the first, and much needed, comparative experimental analysis regarding the use of Neural Networks as building blocks of Learned Indexes. The results reported here highlight the need for the design of very specialized Neural Networks tailored to Learned Indexes and it establishes a solid ground for those developments. Our findings, methodologically important, are of interest to both Scientists and Engineers working in Neural Networks Design and Implementation, in view also of the importance of the application areas involved, e.g., Computer Networks and Data Bases.
△ Less
Submitted 21 February, 2022;
originally announced March 2022.
-
Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform
Authors:
Domenico Amato,
Giosuè Lo Bosco,
Raffaele Giancarlo
Abstract:
Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies,…
▽ More
Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies, that do not use Machine Learning predictions, indicate that other implementations of Binary Search or variants, namely k-ary Search, are better suited to take advantage of the features offered by modern computer architectures. With the use of the Searching on Sorted Sets SOSD Learned Indexing benchmarking software, we investigate how to choose a Search routine for the final stage of searching in a Learned Index. Our results provide indications that better choices than the lower_bound routine can be made. We also highlight how such a choice may be dependent on the computer architecture that is to be used. Overall, our findings provide new and much-needed guidelines for the selection of the Search routine within the Learned Indexing framework.
△ Less
Submitted 8 July, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Learned Sorted Table Search and Static Indexes in Small Model Space
Authors:
Domenico Amato,
Giosuè Lo Bosco,
Raffaele Giancarlo
Abstract:
Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasi…
▽ More
Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor and, in fact, a major open question concerning this area is to assess to what extent one can enjoy the speed-up of Binary Search achieved by Learned Indexes while using constant or nearly constant space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index, respectively; (b) systematically exploring the time-space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. By adhering and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model can speed up Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model index, can achieve a speed-up of Binary Search with a model space of 0:05% more than the one taken by the table, being competitive in terms of time-space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area, since they highlight the need for further studies regarding the time-space relation in Learned Indexes.
△ Less
Submitted 17 September, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines
Authors:
Domenico Amato,
Giosué Lo Bosco,
Raffaele Giancarlo
Abstract:
Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up vi…
▽ More
Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up via a systematic experimental comparison of known efficient implementations of Sorted Table Search procedures, with different Data Layouts, and their Learned counterparts developed here. We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing. Our approach contributes also to the study of Learned Data Structures, a recent proposal to improve the time/space performance of fundamental Data Structures, e.g., B-trees, Hash Tables, Bloom Filters. Indeed, we also formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.
△ Less
Submitted 30 July, 2020; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Thermoelectrically cooled THz quantum cascade laser operating up to 210 K
Authors:
Lorenzo Bosco,
Martin Franckié,
Giacomo Scalari,
Mattias Beck,
Andreas Wacker,
Jérôme Faist
Abstract:
We present a \MF{terahertz} quantum cascade laser operating on a thermoelectric cooler up to a record-high temperature of 210.5 K. The active region design is based on only two quantum wells and achieves high temperature operation thanks to a systematic optimization by means of a nonequilibrium Green's function model. Laser spectra were measured with a room temperature detector, making the whole s…
▽ More
We present a \MF{terahertz} quantum cascade laser operating on a thermoelectric cooler up to a record-high temperature of 210.5 K. The active region design is based on only two quantum wells and achieves high temperature operation thanks to a systematic optimization by means of a nonequilibrium Green's function model. Laser spectra were measured with a room temperature detector, making the whole setup cryogenic free. At low temperatures ($\sim 40 K), a maximum output power of 200 mW was measured.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Effectiveness of Data-Driven Induction of Semantic Spaces and Traditional Classifiers for Sarcasm Detection
Authors:
Mattia Antonino Di Gangi,
Giosué Lo Bosco,
Giovanni Pilato
Abstract:
Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little wo…
▽ More
Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little work has been done for comparing different methods among the available corpora. Furthermore, usually, each author collects and uses their own datasets to evaluate his own method. In this paper, we show that sarcasm detection can be tackled by applying classical machine learning algorithms to input texts sub-symbolically represented in a Latent Semantic space. The main consequence is that our studies establish both reference datasets and baselines for the sarcasm detection problem that could serve the scientific community to test newly proposed methods.
△ Less
Submitted 6 December, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
An electrically pumped phonon-polariton laser
Authors:
Keita Ohtani,
Bo Meng,
Martin Franckié,
Lorenzo Bosco,
Camille Ndebeka-Bandou,
Mattias Beck,
Jérôme Faist
Abstract:
We report a device that provides coherent emission of phonon polaritons, a mixed state between photons and optical phonons in an ionic crystal. An electrically pumped GaInAs/AlInAs quantum cascade structure provides intersubband gain into the polariton mode at = 26.3 μm, allowing self-oscillations close to the longitudinal optical phonon energy of AlAs. Because of the large computed phonon fractio…
▽ More
We report a device that provides coherent emission of phonon polaritons, a mixed state between photons and optical phonons in an ionic crystal. An electrically pumped GaInAs/AlInAs quantum cascade structure provides intersubband gain into the polariton mode at = 26.3 μm, allowing self-oscillations close to the longitudinal optical phonon energy of AlAs. Because of the large computed phonon fraction of the polariton of 65%, the emission appears directly on a Raman spectrum measurement exhibiting a Stokes and anti-Stokes component with the expected shift of 48 meV.
△ Less
Submitted 29 August, 2018;
originally announced August 2018.
-
Two-well quantum cascade laser optimization by non-equilibrium Green's function modelling
Authors:
M. Franckié,
L. Bosco,
M. Beck,
C. Bonzon,
E. Mavrona,
G. Scalari,
A. Wacker,
J. Faist
Abstract:
We present a two-quantum well THz intersubband laser operating up to 192 K. The structure has been optimized with a non-equilibrium Green's function model. The result of this optimization was confirmed experimentally by growing, processing and measuring a number of proposed designs. At high temperature (T>200 K), the simulations indicate that lasing fails due to a combination of electron-electron…
▽ More
We present a two-quantum well THz intersubband laser operating up to 192 K. The structure has been optimized with a non-equilibrium Green's function model. The result of this optimization was confirmed experimentally by growing, processing and measuring a number of proposed designs. At high temperature (T>200 K), the simulations indicate that lasing fails due to a combination of electron-electron scattering, thermal backfilling, and, most importantly, re-absorption coming from broadened states.
△ Less
Submitted 11 April, 2018; v1 submitted 27 September, 2017;
originally announced September 2017.
-
InP/InAsP Nanowire-based Spatially Separate Absorption and Multiplication Avalanche Photodetectors
Authors:
Vishal Jain,
Magnus Heurlin,
Enrique Barrigon,
Lorenzo Bosco,
Ali Nowzari,
Shishir Shroff,
Virginia Boix,
Mohammad Karimi,
Reza J. Jam,
Alexander Berg,
Lars Samuelson,
Magnus T. Borgström,
Federico Capasso,
Håkan Pettersson
Abstract:
Avalanche photodetectors (APDs) are key components in optical communication systems due to their increased photocurrent gain and short response time as compared to conventional photodetectors. A detector design where the multiplication region is implemented in a large bandgap material is desired to avoid detrimental Zener tunneling leakage currents, a concern otherwise in smaller bandgap materials…
▽ More
Avalanche photodetectors (APDs) are key components in optical communication systems due to their increased photocurrent gain and short response time as compared to conventional photodetectors. A detector design where the multiplication region is implemented in a large bandgap material is desired to avoid detrimental Zener tunneling leakage currents, a concern otherwise in smaller bandgap materials required for absorption at 1.3/1.55 um. Self-assembled III-V semiconductor nanowires offer key advantages such as enhanced absorption due to optical resonance effects, strain-relaxed heterostructures and compatibility with main-stream silicon technology. Here, we present electrical and optical characteristics of single InP and InP/InAsP nanowire APD structures. Temperature-dependent breakdown characteristics of p+-n-n+ InP nanowire devices were investigated first. A clear trap-induced shift in breakdown voltage was inferred from I-V measurements. An improved contact formation to the p+-InP segment was observed upon annealing, and its effect on breakdown characteristics was investigated. The bandgap in the absorption region was subsequently varied from pure InP to InAsP to realize spatially separate absorption and multiplication APDs in heterostructure nanowires. In contrast to the homojunction APDs, no trap-induced shifts were observed for the heterostructure APDs. A gain of 12 was demonstrated for selective optical excitation of the InAsP segment. Additional electron beam-induced current measurements were carried out to investigate the effect of local excitation along the nanowire on the I-V characteristics. Our results provide important insight for optimization of avalanche photodetector devices based on III-V nanowires.
△ Less
Submitted 3 June, 2017;
originally announced June 2017.
-
A quantum cascade phonon-polariton laser
Authors:
Keita Ohtani,
Camille Ndebeka-Bandou,
Lorenzo Bosco,
Mattias Beck,
Jérôme Faist
Abstract:
We report a laser that coherently emits phonon-polaritons, quasi-particles arising from the coupling between photons and transverse optical phonons. The gain is provided by an intersubband transition in a quantum cascade structure. The polaritons at h$ν$ = 45.4meV (corresponding to an emission frequency of 10.4THz) are formed by the transverse AlAs phonon mode of monolayer thin AlInAs layers coupl…
▽ More
We report a laser that coherently emits phonon-polaritons, quasi-particles arising from the coupling between photons and transverse optical phonons. The gain is provided by an intersubband transition in a quantum cascade structure. The polaritons at h$ν$ = 45.4meV (corresponding to an emission frequency of 10.4THz) are formed by the transverse AlAs phonon mode of monolayer thin AlInAs layers coupled to the optical modes of a Fabry-Perot cavity. The frequency location of the laser mode is in good agreement with the computed polaritonic dispersion and allows to quantify the constituent fractions of the emitted polaritons that reach a maximum of 50% for the phonon fraction. A fraction of the gain (between 2-5%) originates directly from the coupling between the intersubband and the phonon polarizations. The device exhibits a very low temperature dependence of its threshold current, as well as the capability to operate in very thin ($λ/20$) optical cavities.
△ Less
Submitted 4 October, 2016;
originally announced October 2016.
-
On-chip, self-detected THz dual-comb spectrometer
Authors:
Markus Rösch,
Giacomo Scalari,
Gustavo Villares,
Lorenzo Bosco,
Mattias Beck,
Jérôme Faist
Abstract:
We present a directly generated on-chip dual-comb source at THz frequencies. The multi-heterodyne beating signal of two free-running THz quantum cascade laser frequency combs is measured electrically using one of the combs as a detector, fully exploiting the unique characteristics of quantum cascade active regions. Up to 30 modes can be detected corresponding to a spectral bandwidth of 630 GHz, be…
▽ More
We present a directly generated on-chip dual-comb source at THz frequencies. The multi-heterodyne beating signal of two free-running THz quantum cascade laser frequency combs is measured electrically using one of the combs as a detector, fully exploiting the unique characteristics of quantum cascade active regions. Up to 30 modes can be detected corresponding to a spectral bandwidth of 630 GHz, being the available bandwidth of the dual comb configuration. The multi-heterodyne signal is used to investigate the equidistance of the comb modes showing an accuracy of $10^{-12}$ at the carrier frequency of 2.5 THz.
△ Less
Submitted 1 February, 2016;
originally announced February 2016.
-
Obtaining Traffic Information by Urban Air Quality Inspection
Authors:
P. Ferrante,
D. Lo Bosco,
S. Nicolosi,
G. Scaccianoce,
M. Traverso,
G. Rizzo
Abstract:
The level of air quality in urban centres is affected by emission of several pollutants, mainly coming from the vehicles flowing in their road networks. This is a well known phenomenon that influences the quality of life of people. Despite the deep concern of researchers and technicians, we are far from a total understanding of this phenomenon. On the contrary, the availability of reliable forecas…
▽ More
The level of air quality in urban centres is affected by emission of several pollutants, mainly coming from the vehicles flowing in their road networks. This is a well known phenomenon that influences the quality of life of people. Despite the deep concern of researchers and technicians, we are far from a total understanding of this phenomenon. On the contrary, the availability of reliable forecasting models would constitute an important tool for administrators in order of assessing suitable actions concerning the transportation policies, public as well private. Referring to the situation of the running fleet and the measured pollutant concentrations concerning the Italian town of Palermo, a data-deduced traffic model is here derived, its truthfulness being justified by a fuzzyfication of the phenomenon. A first validation of the model is supplied by utilising the emissions characteristics and the pollutant concentrations referring to a two years period of time. This work could represent a first attempt in defining a new approach to the problem of the pollution of the urban contexts, in order of providing administrators with a reliable and easier tool.
△ Less
Submitted 1 February, 2011;
originally announced February 2011.