-
QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation
Authors:
Gonçalo R. A. Faria,
Sweta Agrawal,
António Farinhas,
Ricardo Rei,
José G. C. de Souza,
André F. T. Martins
Abstract:
An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware an…
▽ More
An important challenge in machine translation (MT) is to generate high-quality and diverse translations. Prior work has shown that the estimated likelihood from the MT model correlates poorly with translation quality. In contrast, quality evaluation metrics (such as COMET or BLEURT) exhibit high correlations with human judgments, which has motivated their use as rerankers (such as quality-aware and minimum Bayes risk decoding). However, relying on a single translation with high estimated quality increases the chances of "gaming the metric''. In this paper, we address the problem of sampling a set of high-quality and diverse translations. We provide a simple and effective way to avoid over-reliance on noisy quality estimates by using them as the energy function of a Gibbs distribution. Instead of looking for a mode in the distribution, we generate multiple samples from high-density areas through the Metropolis-Hastings algorithm, a simple Markov chain Monte Carlo approach. The results show that our proposed method leads to high-quality and diverse outputs across multiple language pairs (English$\leftrightarrow${German, Russian}) with two strong decoder-only LLMs (Alma-7b, Tower-7b).
△ Less
Submitted 28 May, 2024;
originally announced June 2024.
-
Structural investigation of the quasi-one-dimensional topological insulator Bi$_4$I$_4$
Authors:
C. David Hinostroza,
Leandro Rodrigues de Faria,
Gustavo H. Cassemiro,
J. Larrea Jiménez,
Antonio Jefferson da Silva Machado,
Walber H. Brito,
Valentina Martelli
Abstract:
The bismuth-halide Bi$_4$I$_4$ undergoes a structural transition around $T_P\sim 300$K, which separates a high-temperature $β$ phase ($T>T_P$) from a low-temperature $α$ phase ($T<T_P$). $α$ and $β$ phases are suggested to host electronic band structures with distinct topological classifications. Rapid quenching was reported to stabilize a metastable $β$-Bi$_4$I$_4$ at $T<T_P$, making possible a c…
▽ More
The bismuth-halide Bi$_4$I$_4$ undergoes a structural transition around $T_P\sim 300$K, which separates a high-temperature $β$ phase ($T>T_P$) from a low-temperature $α$ phase ($T<T_P$). $α$ and $β$ phases are suggested to host electronic band structures with distinct topological classifications. Rapid quenching was reported to stabilize a metastable $β$-Bi$_4$I$_4$ at $T<T_P$, making possible a comparative study of the physical properties of the two phases in the same low-temperature range. In this work, we present a structural investigation of the Bi$_4$I$_4$ before and after quenching together with electrical resistivity measurements. We found that rapid cooling does not consistently lead to a metastable $β$-Bi$_4$I$_4$, and a quick transition to $α$-Bi$_4$I$_4$ is observed. As a result, the comparison of putative signatures of different topologies attributed to a specific structural phase should be carefully considered. The observed phase instability is accompanied by an increase in iodine vacancies and by a change in the temperature dependence of electrical resistivity, pointing to native defects as a possible origin of our finding. Density functional theory (DFT) calculations support the scenario that iodine vacancies, together with bismuth antisites and interstitials, are among the defects that are more likely to occur in Bi$_4$I$_4$ during the growth.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models
Authors:
Joana Ribeiro de Faria,
Huiyuan Xie,
Felix Steffek
Abstract:
Court transcripts and judgments are rich repositories of legal knowledge, detailing the intricacies of cases and the rationale behind judicial decisions. The extraction of key information from these documents provides a concise overview of a case, crucial for both legal experts and the public. With the advent of large language models (LLMs), automatic information extraction has become increasingly…
▽ More
Court transcripts and judgments are rich repositories of legal knowledge, detailing the intricacies of cases and the rationale behind judicial decisions. The extraction of key information from these documents provides a concise overview of a case, crucial for both legal experts and the public. With the advent of large language models (LLMs), automatic information extraction has become increasingly feasible and efficient. This paper presents a comprehensive study on the application of GPT-4, a large language model, for automatic information extraction from UK Employment Tribunal (UKET) cases. We meticulously evaluated GPT-4's performance in extracting critical information with a manual verification process to ensure the accuracy and relevance of the extracted data. Our research is structured around two primary extraction tasks: the first involves a general extraction of eight key aspects that hold significance for both legal specialists and the general public, including the facts of the case, the claims made, references to legal statutes, references to precedents, general case outcomes and corresponding labels, detailed order and remedies and reasons for the decision. The second task is more focused, aimed at analysing three of those extracted features, namely facts, claims and outcomes, in order to facilitate the development of a tool capable of predicting the outcome of employment law disputes. Through our analysis, we demonstrate that LLMs like GPT-4 can obtain high accuracy in legal information extraction, highlighting the potential of LLMs in revolutionising the way legal information is processed and utilised, offering significant implications for legal research and practice.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Causality from Bottom to Top: A Survey
Authors:
Abraham Itzhak Weinberg,
Cristiano Premebida,
Diego Resende Faria
Abstract:
Causality has become a fundamental approach for explaining the relationships between events, phenomena, and outcomes in various fields of study. It has invaded various fields and applications, such as medicine, healthcare, economics, finance, fraud detection, cybersecurity, education, public policy, recommender systems, anomaly detection, robotics, control, sociology, marketing, and advertising. I…
▽ More
Causality has become a fundamental approach for explaining the relationships between events, phenomena, and outcomes in various fields of study. It has invaded various fields and applications, such as medicine, healthcare, economics, finance, fraud detection, cybersecurity, education, public policy, recommender systems, anomaly detection, robotics, control, sociology, marketing, and advertising. In this paper, we survey its development over the past five decades, shedding light on the differences between causality and other approaches, as well as the preconditions for using it. Furthermore, the paper illustrates how causality interacts with new approaches such as Artificial Intelligence (AI), Generative AI (GAI), Machine and Deep Learning, Reinforcement Learning (RL), and Fuzzy Logic. We study the impact of causality on various fields, its contribution, and its interaction with state-of-the-art approaches. Additionally, the paper exemplifies the trustworthiness and explainability of causality models. We offer several ways to evaluate causality models and discuss future directions.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Government Investments and Entrepreneurship
Authors:
Joao Ricardo Faria,
Laudo Ogura,
Mauricio Prado,
Christopher J. Boudreaux
Abstract:
How can governments attract entrepreneurs and their businesses? The view that new business creation grows with the optimal level of government investments remains appealing to policymakers. In contrast with this active approach, we build a model where governments may adopt a passive approach to stimulating business creation. The insights from this model suggest new business creation depends positi…
▽ More
How can governments attract entrepreneurs and their businesses? The view that new business creation grows with the optimal level of government investments remains appealing to policymakers. In contrast with this active approach, we build a model where governments may adopt a passive approach to stimulating business creation. The insights from this model suggest new business creation depends positively on factors beyond government investments--attracting high-skilled migrants to the region and lower property prices, taxes, and fines on firms in the informal sector. These findings suggest whether entrepreneurs generate business creation in the region does not only depend on government investments. It also depends on location and skilled migration. Our model also provides methodological implications--the relationship between government investments and new business creation is endogenously determined, so unless adjustments are made, econometric estimates will be biased and inconsistent. We conclude with policy and managerial implications.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Hematoxylin and eosin stained oral squamous cell carcinoma histological images dataset
Authors:
Dalí F. D. dos Santos,
Paulo R. de Faria,
Adriano M. Loyola,
Sérgio V. Cardoso,
Bruno A. N. Travençolo,
Marcelo Z. do Nascimento
Abstract:
Computer-aided diagnosis (CAD) can be used as an important tool to aid and enhance pathologists' diagnostic decision-making. Deep learning techniques, such as convolutional neural networks (CNN) and fully convolutional networks (FCN), have been successfully applied in medical and biological research. Unfortunately, histological image segmentation is often constrained by the availability of labeled…
▽ More
Computer-aided diagnosis (CAD) can be used as an important tool to aid and enhance pathologists' diagnostic decision-making. Deep learning techniques, such as convolutional neural networks (CNN) and fully convolutional networks (FCN), have been successfully applied in medical and biological research. Unfortunately, histological image segmentation is often constrained by the availability of labeled training data once labeling histological images for segmentation purposes is a highly-skilled, complex, and time-consuming task. This paper presents the hematoxylin and eosin (H&E) stained oral cavity-derived cancer (OCDC) dataset, a labeled dataset containing H&E-stained histological images of oral squamous cell carcinoma (OSCC) cases. The tumor regions in our dataset are labeled manually by a specialist and validated by a pathologist. The OCDC dataset presents 1,020 histological images of size 640x640 pixels containing tumor regions fully annotated for segmentation purposes. All the histological images are digitized at 20x magnification.
△ Less
Submitted 13 January, 2023;
originally announced March 2023.
-
Superconductivity in Te-deficient ZrTe$_2$
Authors:
L. E. Correa,
P. P. Ferreira,
L. R. de Faria,
V. M. Fim,
M. S. da Luz,
M. S. Torikachvili,
C. Heil,
L. T. F. Eleno,
A. J. S. Machado
Abstract:
We present structural, electrical, and thermoelectric potential measurements on high-quality single crystals of ZrTe$_{1.8}$ grown from isothermal chemical vapor transport. These measurements show that the Te-deficient ZrTe$_{1.8}$, which forms the same structure as the non-superconducting ZrTe$_2$, is superconducting below 3.2\,K. The temperature dependence of the upper critical field (H$_{c2}$)…
▽ More
We present structural, electrical, and thermoelectric potential measurements on high-quality single crystals of ZrTe$_{1.8}$ grown from isothermal chemical vapor transport. These measurements show that the Te-deficient ZrTe$_{1.8}$, which forms the same structure as the non-superconducting ZrTe$_2$, is superconducting below 3.2\,K. The temperature dependence of the upper critical field (H$_{c2}$) deviates from the behavior expected in conventional single-band superconductors, being best described by an electron-phonon two-gap superconducting model with strong intraband coupling. For the ZrTe$_{1.8}$ single crystals, the Seebeck potential measurements suggest that the charge carriers are predominantly negative, in agreement with the ab initio calculations. Through first-principles calculations within DFT, we show that the slight reduction of Te occupancy in ZrTe$_2$ unexpectedly gives origin to density of states peaks at the Fermi level due to the formation of localized Zr-$d$ bands, possibly promoting electronic instabilities at the Fermi level and an increase at the critical temperature according to the standard BCS theory. These findings highlight that the Te deficiency promotes the electronic conditions for the stability of the superconducting ground state, suggesting that defects can fine-tune the electronic structure to support superconductivity.
△ Less
Submitted 10 April, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
Growth of Pure and Intercalated ZrTe2, TiTe2 and HfTe2 Dichalcogenide Single Crystals by Isothermal Chemical Vapor Transport
Authors:
Lucas E. Correa,
Leandro R. de Faria,
Rennan S. Cardoso,
Nabil Chaia,
Mario S. da Luz,
Milton S. Torikachvili,
Antonio J. S. Machado
Abstract:
We report on a modified chemical vapor transport (CVT) methodology for the growth of pure and intercalated Zr, Ti, and Hf dichalcogenide single crystals, e.g. ZrTe2, Gd0.05ZrTe2, HfTe2, and Cu0.05TiTe2. While the most common method for CVT growth is carried out in quartz tubes subjected to a temperature gradient between the charge and the growth location, the growth using this isothermal-CVT (ICVT…
▽ More
We report on a modified chemical vapor transport (CVT) methodology for the growth of pure and intercalated Zr, Ti, and Hf dichalcogenide single crystals, e.g. ZrTe2, Gd0.05ZrTe2, HfTe2, and Cu0.05TiTe2. While the most common method for CVT growth is carried out in quartz tubes subjected to a temperature gradient between the charge and the growth location, the growth using this isothermal-CVT (ICVT) method takes place isothermally in sealed quartz tubes placed horizontally in box furnaces, using iodine (I2) as the transport agent. The structure and composition of crystals were determined by means of X-ray diffraction (XRD), scanning electron microscopy (SEM), and induced coupling plasma (ICP). The crystals grown with this method can be large, and show excellent crystallinity and homogeneity. Their morphology is plate-like, and the larger dimensions can be as long as 15 mm.
△ Less
Submitted 23 March, 2022;
originally announced March 2022.
-
Differentiable Causal Discovery Under Latent Interventions
Authors:
Gonçalo R. A. Faria,
André F. T. Martins,
Mário A. T. Figueiredo
Abstract:
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and…
▽ More
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown. However, previous work assumes that the correspondence between samples and interventions is known, which is often unrealistic. We envision a scenario with an extensive dataset sampled from multiple intervention distributions and one observation distribution, but where we do not know which distribution originated each sample and how the intervention affected the system, \textit{i.e.}, interventions are entirely latent. We propose a method based on neural networks and variational inference that addresses this scenario by framing it as learning a shared causal graph among an infinite mixture (under a Dirichlet process prior) of intervention structural causal models. Experiments with synthetic and real data show that our approach and its semi-supervised variant are able to discover causal relations in this challenging scenario.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Reducing Overconfidence Predictions for Autonomous Driving Perception
Authors:
Gledson Melotti,
Cristiano Premebida,
Jordan J. Bird,
Diego R. Faria,
Nuno Gonçalves
Abstract:
In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose…
▽ More
In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks. We demonstrate that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition. We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual SoftMax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase.
△ Less
Submitted 11 May, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Possible multiband superconductivity in the quaternary carbide YRe2SiC
Authors:
Leandro R. de Faria,
Pedro P. Ferreira,
Lucas E. Correa,
Luiz T. F. Eleno,
Milton S. Torikachvili,
Antonio J. S. Machado
Abstract:
We report for the first time the occurrence of superconductivity in the quaternary silicide carbide YRe2SiC with Tc = 5.9 K. The emergence of superconductivity was confirmed by means of magnetic susceptibility, electrical resistivity, and heat capacity measurements. The presence of a well developed heat capacity feature at Tc confirms that superconductivity is a bulk phenomenon, while a second fea…
▽ More
We report for the first time the occurrence of superconductivity in the quaternary silicide carbide YRe2SiC with Tc = 5.9 K. The emergence of superconductivity was confirmed by means of magnetic susceptibility, electrical resistivity, and heat capacity measurements. The presence of a well developed heat capacity feature at Tc confirms that superconductivity is a bulk phenomenon, while a second feature in the heat capacity near 0.5 Tc combined with the unusual temperature dependence of the upper critical field Hc2(T) indicate the presence of a multiband superconducting state. Additionally, the linear dependence of the lower critical field Hc1 with temperature resemble the behavior found in compounds with unconventional pairing symmetry. Band structure calculations reveal YRe2SiC could harbor a non-trivial topological state and that the low-energy states occupy multiple disconnected sheets at the Fermi surface, with different degrees of hybridization, nesting, and screening effects, therefore making unconventional multiband superconductivity plausible.
△ Less
Submitted 16 May, 2021;
originally announced May 2021.
-
Fruit Quality and Defect Image Classification with Conditional GAN Data Augmentation
Authors:
Jordan J. Bird,
Chloe M. Barnes,
Luis J. Manso,
Anikó Ekárt,
Diego R. Faria
Abstract:
Contemporary Artificial Intelligence technologies allow for the employment of Computer Vision to discern good crops from bad, providing a step in the pipeline of selecting healthy fruit from undesirable fruit, such as those which are mouldy or gangrenous. State-of-the-art works in the field report high accuracy results on small datasets (<1000 images), which are not representative of the populatio…
▽ More
Contemporary Artificial Intelligence technologies allow for the employment of Computer Vision to discern good crops from bad, providing a step in the pipeline of selecting healthy fruit from undesirable fruit, such as those which are mouldy or gangrenous. State-of-the-art works in the field report high accuracy results on small datasets (<1000 images), which are not representative of the population regarding real-world usage. The goals of this study are to further enable real-world usage by improving generalisation with data augmentation as well as to reduce overfitting and energy usage through model pruning. In this work, we suggest a machine learning pipeline that combines the ideas of fine-tuning, transfer learning, and generative model-based training data augmentation towards improving fruit quality image classification. A linear network topology search is performed to tune a VGG16 lemon quality classification model using a publicly-available dataset of 2690 images. We find that appending a 4096 neuron fully connected layer to the convolutional layers leads to an image classification accuracy of 83.77%. We then train a Conditional Generative Adversarial Network on the training data for 2000 epochs, and it learns to generate relatively realistic images. Grad-CAM analysis of the model trained on real photographs shows that the synthetic images can exhibit classifiable characteristics such as shape, mould, and gangrene. A higher image classification accuracy of 88.75% is then attained by augmenting the training with synthetic images, arguing that Conditional Generative Adversarial Networks have the ability to produce new data to alleviate issues of data scarcity. Finally, model pruning is performed via polynomial decay, where we find that the Conditional GAN-augmented classification network can retain 81.16% classification accuracy when compressed to 50% of its original size.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
A Graph Neural Network to Model Disruption in Human-Aware Robot Navigation
Authors:
Pilar Bachiller,
Daniel Rodriguez-Criado,
Ronit R. Jorvekar,
Pablo Bustos,
Diego R. Faria,
Luis J. Manso
Abstract:
Autonomous navigation is a key skill for assistive and service robots. To be successful, robots have to minimise the disruption caused to humans while moving. This implies predicting how people will move and complying with social conventions. Avoiding disrupting personal spaces, people's paths and interactions are examples of these social conventions. This paper leverages Graph Neural Networks to…
▽ More
Autonomous navigation is a key skill for assistive and service robots. To be successful, robots have to minimise the disruption caused to humans while moving. This implies predicting how people will move and complying with social conventions. Avoiding disrupting personal spaces, people's paths and interactions are examples of these social conventions. This paper leverages Graph Neural Networks to model robot disruption considering the movement of the humans and the robot so that the model built can be used by path planning algorithms. Along with the model, this paper presents an evolution of the dataset SocNav1 [25] which considers the movement of the robot and the humans, and an updated scenario-to-graph transformation which is tested using different Graph Neural Network blocks. The model trained achieves close-to-human performance in the dataset. In addition to its accuracy, the main advantage of the approach is its scalability in terms of the number of social factors that can be considered in comparison with handcrafted models. The dataset and the model are available in a public repository (https://github.com/gnns4hri/sngnnv2).
△ Less
Submitted 25 May, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Evidence for multiband superconductivity and charge density waves in Ni-doped ZrTe$_2$
Authors:
Lucas E. Correa,
Pedro P. Ferreira,
Leandro R. de Faria,
Thiago T. Dorini,
Mário S. da Luz,
Zachary Fisk,
Milton S. Torikachvili,
Luiz T. F. Eleno,
Antonio J. S. Machado
Abstract:
We carried out a comprehensive study of the electronic, magnetic, and thermodynamic properties of Ni-doped ZrTe$_2$. High quality Ni$_{0.04}$ZrTe$_{1.89}$ single crystals show a possible coexistence of charge density waves (CDW, T$_{CDW}\approx287$\,K) with superconductivity (T$_c\approx 4.1$\,K), which we report here for the first time. The temperature dependence of the lower (H$_{c_1}$) and uppe…
▽ More
We carried out a comprehensive study of the electronic, magnetic, and thermodynamic properties of Ni-doped ZrTe$_2$. High quality Ni$_{0.04}$ZrTe$_{1.89}$ single crystals show a possible coexistence of charge density waves (CDW, T$_{CDW}\approx287$\,K) with superconductivity (T$_c\approx 4.1$\,K), which we report here for the first time. The temperature dependence of the lower (H$_{c_1}$) and upper (H$_{c_2}$) critical magnetic fields both deviate significantly from the behaviors expected in conventional single-gap s-wave superconductors. However, the behaviors of the normalized superfluid density $ρ_s(T)$ and H$_{c_2}(T)$ can be described well using a two-gap model for the Fermi surface, in a manner consistent with conventional multiband superconductivity. Electrical resistivity and specific heat measurements show clear anomalies centered near 287\,K consistent with a CDW phase transition. Additionally, electronic-structure calculations support the coexistence of electron-phonon multiband superconductivity and CDW order due to the compensated disconnected nature of the electron- and hole-pockets at the Fermi surface. Our electronic structure calculations also suggest that ZrTe$_2$ could reach a non-trivial topological type-II Dirac semimetallic state. These findings highlight that Ni-doped ZrTe2 can be uniquely important for probing the coexistence of superconducting and CDW ground states in an electronic system with non-trivial topology.
△ Less
Submitted 8 March, 2022; v1 submitted 9 February, 2021;
originally announced February 2021.
-
Evaluating the Performance of Twitter-based Exploit Detectors
Authors:
Daniel Alves de Sousa,
Elaine Ribeiro de Faria,
Rodrigo Sanches Miani
Abstract:
Patch prioritization is a crucial aspect of information systems security, and knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators accomplish this task. The analysis of social media for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning technique…
▽ More
Patch prioritization is a crucial aspect of information systems security, and knowledge of which vulnerabilities were exploited in the wild is a powerful tool to help systems administrators accomplish this task. The analysis of social media for this specific application can enhance the results and bring more agility by collecting data from online discussions and applying machine learning techniques to detect real-world exploits. In this paper, we use a technique that combines Twitter data with public database information to classify vulnerabilities as exploited or not-exploited. We analyze the behavior of different classifying algorithms, investigate the influence of different antivirus data as ground truth, and experiment with various time window sizes. Our findings suggest that using a Light Gradient Boosting Machine (LightGBM) can benefit the results, and for most cases, the statistics related to a tweet and the users who tweeted are more meaningful than the text tweeted. We also demonstrate the importance of using ground-truth data from security companies not mentioned in previous works.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Chatbot Interaction with Artificial Intelligence: Human Data Augmentation with T5 and Language Transformer Ensemble for Text Classification
Authors:
Jordan J. Bird,
Anikó Ekárt,
Diego R. Faria
Abstract:
In this work, we present the Chatbot Interaction with Artificial Intelligence (CI-AI) framework as an approach to the training of deep learning chatbots for task classification. The intelligent system augments human-sourced data via artificial paraphrasing in order to generate a large set of training data for further classical, attention, and language transformation-based learning approaches for N…
▽ More
In this work, we present the Chatbot Interaction with Artificial Intelligence (CI-AI) framework as an approach to the training of deep learning chatbots for task classification. The intelligent system augments human-sourced data via artificial paraphrasing in order to generate a large set of training data for further classical, attention, and language transformation-based learning approaches for Natural Language Processing. Human beings are asked to paraphrase commands and questions for task identification for further execution of a machine. The commands and questions are split into training and validation sets. A total of 483 responses were recorded. Secondly, the training set is paraphrased by the T5 model in order to augment it with further data. Seven state-of-the-art transformer-based text classification algorithms (BERT, DistilBERT, RoBERTa, DistilRoBERTa, XLM, XLM-RoBERTa, and XLNet) are benchmarked for both sets after fine-tuning on the training data for two epochs. We find that all models are improved when training data is augmented by the T5 model, with an average increase of classification accuracy by 4.01%. The best result was the RoBERTa model trained on T5 augmented data which achieved 98.96% classification accuracy. Finally, we found that an ensemble of the five best-performing transformer models via Logistic Regression of output label predictions led to an accuracy of 99.59% on the dataset of human responses. A highly-performing model allows the intelligent system to interpret human commands at the social-interaction level through a chatbot-like interface (e.g. "Robot, can we have a conversation?") and allows for better accessibility to AI by non-technical users.
△ Less
Submitted 22 October, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
An Online and Nonuniform Timeslicing Method for Network Visualisation
Authors:
Jean R. Ponciano,
Claudio D. G. Linhares,
Elaine R. Faria,
Bruno A. N. Travencolo
Abstract:
Visual analysis of temporal networks comprises an effective way to understand the network dynamics, facilitating the identification of patterns, anomalies, and other network properties, thus resulting in fast decision making. The amount of data in real-world networks, however, may result in a layout with high visual clutter due to edge overlap**. This is particularly relevant in the so-called st…
▽ More
Visual analysis of temporal networks comprises an effective way to understand the network dynamics, facilitating the identification of patterns, anomalies, and other network properties, thus resulting in fast decision making. The amount of data in real-world networks, however, may result in a layout with high visual clutter due to edge overlap**. This is particularly relevant in the so-called streaming networks, in which edges are continuously arriving (online) and in non-stationary distribution. All three network dimensions, namely node, edge, and time, can be manipulated to reduce such clutter and improve readability. This paper presents an online and nonuniform timeslicing method, thus considering the underlying network structure and addressing streaming network analyses. We conducted experiments using two real-world networks to compare our method against uniform and nonuniform timeslicing strategies. The results show that our method automatically selects timeslices that effectively reduce visual clutter in periods with bursts of events. As a consequence, decision making based on the identification of global temporal patterns becomes faster and more reliable.
△ Less
Submitted 23 September, 2020;
originally announced September 2020.
-
Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines
Authors:
Jordan J. Bird,
Diego R. Faria,
Cristiano Premebida,
Anikó Ekárt,
George Vogiatzis
Abstract:
The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We…
▽ More
The novelty of this study consists in a multi-modality approach to scene classification, where image and audio complement each other in a process of deep late fusion. The approach is demonstrated on a difficult classification problem, consisting of two synchronised and balanced datasets of 16,000 data objects, encompassing 4.4 hours of video of 8 environments with varying degrees of similarity. We first extract video frames and accompanying audio at one second intervals. The image and the audio datasets are first classified independently, using a fine-tuned VGG16 and an evolutionary optimised deep neural network, with accuracies of 89.27% and 93.72%, respectively. This is followed by late fusion of the two neural networks to enable a higher order function, leading to accuracy of 96.81% in this multi-modality classifier with synchronised video frames and audio clips. The tertiary neural network implemented for late fusion outperforms classical state-of-the-art classifiers by around 3% when the two primary networks are considered as feature generators. We show that situations where a single-modality may be confused by anomalous data points are now corrected through an emerging higher order integration. Prominent examples include a water feature in a city misclassified as a river by the audio classifier alone and a densely crowded street misclassified as a forest by the image classifier alone. Both are examples which are correctly classified by our multi-modality approach.
△ Less
Submitted 11 July, 2020;
originally announced July 2020.
-
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity
Authors:
Jordan J. Bird,
Diego R. Faria,
Anikó Ekárt,
Cristiano Premebida,
Pedro P. S. Ayrosa
Abstract:
In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification. In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes. Using character level LSTMs (supervised learning) and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated b…
▽ More
In speech recognition problems, data scarcity often poses an issue due to the willingness of humans to provide large amounts of data for learning and classification. In this work, we take a set of 5 spoken Harvard sentences from 7 subjects and consider their MFCC attributes. Using character level LSTMs (supervised learning) and OpenAI's attention-based GPT-2 models, synthetic MFCCs are generated by learning from the data provided on a per-subject basis. A neural network is trained to classify the data against a large dataset of Flickr8k speakers and is then compared to a transfer learning network performing the same task but with an initial weight distribution dictated by learning from the synthetic data generated by the two models. The best result for all of the 7 subjects were networks that had been exposed to synthetic data, the model pre-trained with LSTM-produced data achieved the best result 3 times and the GPT-2 equivalent 5 times (since one subject had their best result from both models at a draw). Through these results, we argue that speaker classification can be improved by utilising a small amount of user data but with exposure to synthetically-generated MFCCs which then allow the networks to achieve near maximum classification scores.
△ Less
Submitted 3 July, 2020; v1 submitted 1 July, 2020;
originally announced July 2020.
-
Probabilistic Object Classification using CNN ML-MAP layers
Authors:
G. Melotti,
C. Premebida,
J. J. Bird,
D. R. Faria,
N. Gonçalves
Abstract:
Deep networks are currently the state-of-the-art for sensory perception in autonomous driving and robotics. However, deep models often generate overconfident predictions precluding proper probabilistic interpretation which we argue is due to the nature of the SoftMax layer. To reduce the overconfidence without compromising the classification performance, we introduce a CNN probabilistic approach b…
▽ More
Deep networks are currently the state-of-the-art for sensory perception in autonomous driving and robotics. However, deep models often generate overconfident predictions precluding proper probabilistic interpretation which we argue is due to the nature of the SoftMax layer. To reduce the overconfidence without compromising the classification performance, we introduce a CNN probabilistic approach based on distributions calculated in the network's Logit layer. The approach enables Bayesian inference by means of ML and MAP layers. Experiments with calibrated and the proposed prediction layers are carried out on object classification using data from the KITTI database. Results are reported for camera ($RGB$) and LiDAR (range-view) modalities, where the new approach shows promising performance compared to SoftMax.
△ Less
Submitted 24 August, 2020; v1 submitted 29 May, 2020;
originally announced May 2020.
-
Hardware implementation of Bayesian network building blocks with stochastic spintronic devices
Authors:
Punyashloka Debashis,
Vaibhav Ostwal,
Rafatul Faria,
Supriyo Datta,
Joerg Appenzeller,
Zhihong Chen
Abstract:
Bayesian networks are powerful statistical models to understand causal relationships in real-world probabilistic problems such as diagnosis, forecasting, computer vision, etc. For systems that involve complex causal dependencies among many variables, the complexity of the associated Bayesian networks become computationally intractable. As a result, direct hardware implementation of these networks…
▽ More
Bayesian networks are powerful statistical models to understand causal relationships in real-world probabilistic problems such as diagnosis, forecasting, computer vision, etc. For systems that involve complex causal dependencies among many variables, the complexity of the associated Bayesian networks become computationally intractable. As a result, direct hardware implementation of these networks is one promising approach to reducing power consumption and execution time. However, the few hardware implementations of Bayesian networks presented in literature rely on deterministic CMOS devices that are not efficient in representing the inherently stochastic variables in a Bayesian network. This work presents an experimental demonstration of a Bayesian network building block implemented with naturally stochastic spintronic devices. These devices are based on nanomagnets with perpendicular magnetic anisotropy, initialized to their hard axes by the spin orbit torque from a heavy metal under-layer utilizing the giant spin Hall effect, enabling stochastic behavior. We construct an electrically interconnected network of two stochastic devices and manipulate the correlations between their states by changing connection weights and biases. By map** given conditional probability tables to the circuit hardware, we demonstrate that any two node Bayesian networks can be implemented by our stochastic network. We then present the stochastic simulation of an example case of a four node Bayesian network using our proposed device, with parameters taken from the experiment. We view this work as a first step towards the large scale hardware implementation of Bayesian networks.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
Hardware Design for Autonomous Bayesian Networks
Authors:
Rafatul Faria,
Jan Kaiser,
Kerem Y. Camsari,
Supriyo Datta
Abstract:
Directed acyclic graphs or Bayesian networks that are popular in many AI related sectors for probabilistic inference and causal reasoning can be mapped to probabilistic circuits built out of probabilistic bits (p-bits), analogous to binary stochastic neurons of stochastic artificial neural networks. In order to satisfy standard statistical results, individual p-bits not only need to be updated seq…
▽ More
Directed acyclic graphs or Bayesian networks that are popular in many AI related sectors for probabilistic inference and causal reasoning can be mapped to probabilistic circuits built out of probabilistic bits (p-bits), analogous to binary stochastic neurons of stochastic artificial neural networks. In order to satisfy standard statistical results, individual p-bits not only need to be updated sequentially, but also in order from the parent to the child nodes, necessitating the use of sequencers in software implementations. In this article, we first use SPICE simulations to show that an autonomous hardware Bayesian network can operate correctly without any clocks or sequencers, but only if the individual p-bits are appropriately designed. We then present a simple behavioral model of the autonomous hardware illustrating the essential characteristics needed for correct sequencer-free operation. This model is also benchmarked against SPICE simulations and can be used to simulate large scale networks. Our results could be useful in the design of hardware accelerators that use energy efficient building blocks suited for low-level implementations of Bayesian networks. The autonomous massively parallel operation of our proposed stochastic hardware has biological relevance since neural dynamics in brain is also stochastic and autonomous by nature.
△ Less
Submitted 3 July, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Probabilistic Circuits for Autonomous Learning: A simulation study
Authors:
Jan Kaiser,
Rafatul Faria,
Kerem Y. Camsari,
Supriyo Datta
Abstract:
Modern machine learning is based on powerful algorithms running on digital computing platforms and there is great interest in accelerating the learning process and making it more energy efficient. In this paper we present a fully autonomous probabilistic circuit for fast and efficient learning that makes no use of digital computing. Specifically we use SPICE simulations to demonstrate a clockless…
▽ More
Modern machine learning is based on powerful algorithms running on digital computing platforms and there is great interest in accelerating the learning process and making it more energy efficient. In this paper we present a fully autonomous probabilistic circuit for fast and efficient learning that makes no use of digital computing. Specifically we use SPICE simulations to demonstrate a clockless autonomous circuit where the required synaptic weights are read out in the form of analog voltages. Such autonomous circuits could be particularly of interest as standalone learning devices in the context of mobile and edge computing.
△ Less
Submitted 25 February, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
On the Effects of Pseudo and Quantum Random Number Generators in Soft Computing
Authors:
Jordan J. Bird,
Anikó Ekárt,
Diego R. Faria
Abstract:
In this work, we argue that the implications of Pseudo and Quantum Random Number Generators (PRNG and QRNG) inexplicably affect the performances and behaviours of various machine learning models that require a random input. These implications are yet to be explored in Soft Computing until this work. We use a CPU and a QPU to generate random numbers for multiple Machine Learning techniques. Random…
▽ More
In this work, we argue that the implications of Pseudo and Quantum Random Number Generators (PRNG and QRNG) inexplicably affect the performances and behaviours of various machine learning models that require a random input. These implications are yet to be explored in Soft Computing until this work. We use a CPU and a QPU to generate random numbers for multiple Machine Learning techniques. Random numbers are employed in the random initial weight distributions of Dense and Convolutional Neural Networks, in which results show a profound difference in learning patterns for the two. In 50 Dense Neural Networks (25 PRNG/25 QRNG), QRNG increases over PRNG for accent classification at +0.1%, and QRNG exceeded PRNG for mental state EEG classification by +2.82%. In 50 Convolutional Neural Networks (25 PRNG/25 QRNG), the MNIST and CIFAR-10 problems are benchmarked, in MNIST the QRNG experiences a higher starting accuracy than the PRNG but ultimately only exceeds it by 0.02%. In CIFAR-10, the QRNG outperforms PRNG by +0.92%. The n-random split of a Random Tree is enhanced towards and new Quantum Random Tree (QRT) model, which has differing classification abilities to its classical counterpart, 200 trees are trained and compared (100 PRNG/100 QRNG). Using the accent and EEG classification datasets, a QRT seemed inferior to a RT as it performed on average worse by -0.12%. This pattern is also seen in the EEG classification problem, where a QRT performs worse than a RT by -0.28%. Finally, the QRT is ensembled into a Quantum Random Forest (QRF), which also has a noticeable effect when compared to the standard Random Forest (RF)... ABSTRACT SHORTENED DUE TO ARXIV LIMIT
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Correlated fluctuations in spin orbit torque-coupled perpendicular nanomagnets
Authors:
Punyashloka Debashis,
Rafatul Faria,
Kerem Y. Camsari,
Supriyo Datta,
Zhihong Chen
Abstract:
Low barrier nanomagnets have attracted a lot of research interest for their use as sources of high quality true random number generation. More recently, low barrier nanomagnets with tunable output have been shown to be a natural hardware platform for unconventional computing paradigms such as probabilistic spin logic. Efficient generation and tunability of high quality random bits is critical for…
▽ More
Low barrier nanomagnets have attracted a lot of research interest for their use as sources of high quality true random number generation. More recently, low barrier nanomagnets with tunable output have been shown to be a natural hardware platform for unconventional computing paradigms such as probabilistic spin logic. Efficient generation and tunability of high quality random bits is critical for these novel applications. However, current spintronic random number generators are based on superparamagnetic tunnel junctions (SMTJs) with tunability obtained through spin transfer torque (STT), which unavoidably leads to challenges in designing concatenated networks using these two terminal devices. The more recent development of utilizing spin orbit torque (SOT) allows for a three terminal device design, but can only tune in-plane magnetization freely, which is not very energy efficient due to the needs of overcoming a large demagnetization field. In this work, we experimentally demonstrate for the first time, a stochastic device with perpendicular magnetic anisotropy (PMA) that is completely tunable by SOT without the aid of any external magnetic field. Our measurements lead us to hypothesize that a tilted anisotropy might be responsible for the observed tunability. We carry out stochastic Landau-Lifshitz-Gilbert (sLLG) simulations to confirm our experimental observation. Finally, we build an electrically coupled network of two such stochastic nanomagnet based devices and demonstrate that finite correlation or anti-correlation can be established between their output fluctuations by a weak interconnection, despite having a large difference in their natural fluctuation time scale. Simulations based on a newly developed dynamical model for autonomous circuits composed of low barrier nanomagnets show close agreement with the experimental results.
△ Less
Submitted 7 October, 2019;
originally announced October 2019.
-
Graph Neural Networks for Human-aware Social Navigation
Authors:
Luis J. Manso,
Ronit R. Jorvekar,
Diego R. Faria,
Pablo Bustos,
Pilar Bachiller
Abstract:
Autonomous navigation is a key skill for assistive and service robots. To be successful, robots have to navigate avoiding going through the personal spaces of the people surrounding them. Complying with social rules such as not getting in the middle of human-to-human and human-to-object interactions is also important. This paper suggests using Graph Neural Networks to model how inconvenient the pr…
▽ More
Autonomous navigation is a key skill for assistive and service robots. To be successful, robots have to navigate avoiding going through the personal spaces of the people surrounding them. Complying with social rules such as not getting in the middle of human-to-human and human-to-object interactions is also important. This paper suggests using Graph Neural Networks to model how inconvenient the presence of a robot would be in a particular scenario according to learned human conventions so that it can be used by path planning algorithms. To do so, we propose two ways of modelling social interactions using graphs and benchmark them with different Graph Neural Networks using the SocNav1 dataset. We achieve close-to-human performance in the dataset and argue that, in addition to promising results, the main advantage of the approach is its scalability in terms of the number of social factors that can be considered and easily embedded in code, in comparison with model-based approaches. The code used to train and test the resulting graph neural network is available in a public repository.
△ Less
Submitted 10 September, 2020; v1 submitted 19 September, 2019;
originally announced September 2019.
-
SocNav1: A Dataset to Benchmark and Learn Social Navigation Conventions
Authors:
Luis J. Manso,
Pedro Nunez,
Luis V. Calderita,
Diego R. Faria,
Pilar Bachiller
Abstract:
Adapting to social conventions is an unavoidable requirement for the acceptance of assistive and social robots. While the scientific community broadly accepts that assistive robots and social robot companions are unlikely to have widespread use in the near future, their presence in health-care and other medium-sized institutions is becoming a reality. These robots will have a beneficial impact in…
▽ More
Adapting to social conventions is an unavoidable requirement for the acceptance of assistive and social robots. While the scientific community broadly accepts that assistive robots and social robot companions are unlikely to have widespread use in the near future, their presence in health-care and other medium-sized institutions is becoming a reality. These robots will have a beneficial impact in industry and other fields such as health care. The growing number of research contributions to social navigation is also indicative of the importance of the topic. To foster the future prevalence of these robots, they must be useful, but also socially accepted. The first step to be able to actively ask for collaboration or permission is to estimate whether the robot would make people feel uncomfortable otherwise, and that is precisely the goal of algorithms evaluating social navigation compliance. Some approaches provide analytic models, whereas others use machine learning techniques such as neural networks. This data report presents and describes SocNav1, a dataset for social navigation conventions. The aims of SocNav1 are two-fold: a) enabling comparison of the algorithms that robots use to assess the convenience of their presence in a particular position when navigating; b) providing a sufficient amount of data so that modern machine learning algorithms such as deep neural networks can be used. Because of the structured nature of the data, SocNav1 is particularly well-suited to be used to benchmark non-Euclidean machine learning algorithms such as Graph Neural Networks (see [1]). The dataset has been made available in a public repository.
△ Less
Submitted 14 January, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
A Deep Evolutionary Approach to Bioinspired Classifier Optimisation for Brain-Machine Interaction
Authors:
Jordan J. Bird,
Diego R. Faria,
Luis J. Manso,
Anikó Ekárt,
Christopher D. Buckingham
Abstract:
This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP…
▽ More
This study suggests a new approach to EEG data classification by exploring the idea of using evolutionary computation to both select useful discriminative EEG features and optimise the topology of Artificial Neural Networks. An evolutionary algorithm is applied to select the most informative features from an initial set of 2550 EEG statistical features. Optimisation of a Multilayer Perceptron (MLP) is performed with an evolutionary approach before classification to estimate the best hyperparameters of the network. Deep learning and tuning with Long Short-Term Memory (LSTM) are also explored, and Adaptive Boosting of the two types of models is tested for each problem. Three experiments are provided for comparison using different classifiers: one for attention state classification, one for emotional sentiment classification, and a third experiment in which the goal is to guess the number a subject is thinking of. The obtained results show that an Adaptive Boosted LSTM can achieve an accuracy of 84.44%, 97.06%, and 9.94% on the attentional, emotional, and number datasets, respectively. An evolutionary-optimised MLP achieves results close to the Adaptive Boosted LSTM for the two first experiments and significantly higher for the number-guessing experiment with an Adaptive Boosted DEvo MLP reaching 31.35%, while being significantly quicker to train and classify. In particular, the accuracy of the nonboosted DEvo MLP was of 79.81%, 96.11%, and 27.07% in the same benchmarks. Two datasets for the experiments were gathered using a Muse EEG headband with four electrodes corresponding to TP9, AF7, AF8, and TP10 locations of the international EEG placement standard. The EEG MindBigData digits dataset was gathered from the TP9, FP1, FP2, and TP10 locations.
△ Less
Submitted 13 August, 2019;
originally announced August 2019.
-
Autonomous Probabilistic Coprocessing with Petaflips per Second
Authors:
Brian Sutton,
Rafatul Faria,
Lakshmi A. Ghantasala,
Risi Jaiswal,
Kerem Y. Camsari,
Supriyo Datta
Abstract:
In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary…
▽ More
In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore \textit{sequencerless} designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution's fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric $-$ flips per second $-$ as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore's Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware.
△ Less
Submitted 22 August, 2020; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Standard Model Physics at the HL-LHC and HE-LHC
Authors:
P. Azzi,
S. Farry,
P. Nason,
A. Tricoli,
D. Zeppenfeld,
R. Abdul Khalek,
J. Alimena,
N. Andari,
L. Aperio Bella,
A. J. Armbruster,
J. Baglio,
S. Bailey,
E. Bakos,
A. Bakshi,
C. Baldenegro,
F. Balli,
A. Barker,
W. Barter,
J. de Blas,
F. Blekman,
D. Bloch,
A. Bodek,
M. Boonekamp,
E. Boos,
J. D. Bossio Sola
, et al. (201 additional authors not shown)
Abstract:
The successful operation of the Large Hadron Collider (LHC) and the excellent performance of the ATLAS, CMS, LHCb and ALICE detectors in Run-1 and Run-2 with $pp$ collisions at center-of-mass energies of 7, 8 and 13 TeV as well as the giant leap in precision calculations and modeling of fundamental interactions at hadron colliders have allowed an extraordinary breadth of physics studies including…
▽ More
The successful operation of the Large Hadron Collider (LHC) and the excellent performance of the ATLAS, CMS, LHCb and ALICE detectors in Run-1 and Run-2 with $pp$ collisions at center-of-mass energies of 7, 8 and 13 TeV as well as the giant leap in precision calculations and modeling of fundamental interactions at hadron colliders have allowed an extraordinary breadth of physics studies including precision measurements of a variety physics processes. The LHC results have so far confirmed the validity of the Standard Model of particle physics up to unprecedented energy scales and with great precision in the sectors of strong and electroweak interactions as well as flavour physics, for instance in top quark physics. The upgrade of the LHC to a High Luminosity phase (HL-LHC) at 14 TeV center-of-mass energy with 3 ab$^{-1}$ of integrated luminosity will probe the Standard Model with even greater precision and will extend the sensitivity to possible anomalies in the Standard Model, thanks to a ten-fold larger data set, upgraded detectors and expected improvements in the theoretical understanding. This document summarises the physics reach of the HL-LHC in the realm of strong and electroweak interactions and top quark physics, and provides a glimpse of the potential of a possible further upgrade of the LHC to a 27 TeV $pp$ collider, the High-Energy LHC (HE-LHC), assumed to accumulate an integrated luminosity of 15 ab$^{-1}$.
△ Less
Submitted 20 December, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Low Barrier Magnet Design for Efficient Hardware Binary Stochastic Neurons
Authors:
Orchi Hassan,
Rafatul Faria,
Kerem Y. Camsari,
Jonathan Z. Sun,
Supriyo Datta
Abstract:
Binary stochastic neurons (BSN's) form an integral part of many machine learning algorithms, motivating the development of hardware accelerators for this complex function. It has been recognized that hardware BSN's can be implemented using low barrier magnets (LBM's) by minimally modifying present-day magnetoresistive random access memory (MRAM) devices. A crucial parameter that determines the res…
▽ More
Binary stochastic neurons (BSN's) form an integral part of many machine learning algorithms, motivating the development of hardware accelerators for this complex function. It has been recognized that hardware BSN's can be implemented using low barrier magnets (LBM's) by minimally modifying present-day magnetoresistive random access memory (MRAM) devices. A crucial parameter that determines the response of these LBM based BSN designs is the \emph{correlation time} of magnetization, $τ_c$. In this letter, we show that for magnets with low energy barriers ($Δ\approx k_BT$ and below), circular disk magnets with in-plane magnetic anisotropy (IMA) lead to $τ_c$ values that are two orders of magnitude smaller compared to $τ_c$ for magnets having perpendicular magnetic anisotropy (PMA) and provide analytical descriptions. We show that this striking difference in $τ_c$ is due to a precession-like fluctuation mechanism that is enabled by the large demagnetization field in IMA magnets. We provide a detailed energy-delay performance evaluation of previously proposed BSN designs based on Spin-Orbit-Torque (SOT) MRAM and Spin-Transfer-Torque (STT) MRAM employing low barrier circular IMA magnets by SPICE simulations. The designs exhibit sub-ns response times leading to energy requirements of $\sim$a few fJ to evaluate the BSN function, orders of magnitude lower than digital CMOS implementations with a much larger footprint. While modern MRAM technology is based on PMA magnets, results in this paper suggest that low barrier circular IMA magnets may be more suitable for this application.
△ Less
Submitted 20 April, 2019; v1 submitted 10 February, 2019;
originally announced February 2019.
-
Rectification in Spin-Orbit Materials Using Low Energy Barrier Magnets
Authors:
Shehrin Sayed,
Kerem Y. Camsari,
Rafatul Faria,
Supriyo Datta
Abstract:
The coupling of spin-orbit materials to high energy barrier ($\sim$40-60 $k_BT$) nano-magnets has attracted growing interest for exciting new physics and various spintronic applications. We predict that a coupling between the spin-momentum locking (SML) observed in spin-orbit materials and low-energy barrier magnets (LBM) should exhibit a unique multi-terminal rectification for arbitrarily small a…
▽ More
The coupling of spin-orbit materials to high energy barrier ($\sim$40-60 $k_BT$) nano-magnets has attracted growing interest for exciting new physics and various spintronic applications. We predict that a coupling between the spin-momentum locking (SML) observed in spin-orbit materials and low-energy barrier magnets (LBM) should exhibit a unique multi-terminal rectification for arbitrarily small amplitude channel currents. The basic idea is to measure the charge current induced spin accumulation in the SML channel in the form of a magnetization dependent voltage using an LBM, either with an in-plane or perpendicular anisotropy (IMA or PMA). The LBM feels an instantaneous spin-orbit torque due to the accumulated spins in the channel which causes the average magnetization to follow the current, leading to the non-linear rectification. We discuss the frequency band of this multi-terminal rectification which can be understood in terms of the angular momentum conservation in the LBM. For a fixed spin-current from the SML channel, the frequency band is same for LBMs with IMA and PMA, as long as they have the same total magnetic moment in a given volume. The proposed all-metallic structure could find application as highly sensitive passive rf detectors and as energy harvesters from weak ambient sources where standard technologies may not operate.
△ Less
Submitted 27 May, 2019; v1 submitted 1 December, 2018;
originally announced December 2018.
-
Implementing Bayesian Networks with Embedded Stochastic MRAM
Authors:
Rafatul Faria,
Kerem Y. Camsari,
Supriyo Datta
Abstract:
Magnetic tunnel junctions (MTJ's) with low barrier magnets have been used to implement random number generators (RNG's) and it has recently been shown that such an MTJ connected to the drain of a conventional transistor provides a three-terminal tunable RNG or a $p$-bit. In this letter we show how this $p$-bit can be used to build a $p$-circuit that emulates a Bayesian network (BN), such that the…
▽ More
Magnetic tunnel junctions (MTJ's) with low barrier magnets have been used to implement random number generators (RNG's) and it has recently been shown that such an MTJ connected to the drain of a conventional transistor provides a three-terminal tunable RNG or a $p$-bit. In this letter we show how this $p$-bit can be used to build a $p$-circuit that emulates a Bayesian network (BN), such that the correlations in real world variables can be obtained from electrical measurements on the corresponding circuit nodes. The $p$-circuit design proceeds in two steps: the BN is first translated into a behavioral model, called Probabilistic Spin Logic (PSL), defined by dimensionless biasing (h) and interconnection (J) coefficients, which are then translated into electronic circuit elements. As a benchmark example, we mimic a family tree of three generations and show that the genetic relatedness calculated from a SPICE-compatible circuit simulator matches well-known results.
△ Less
Submitted 2 April, 2018; v1 submitted 1 January, 2018;
originally announced January 2018.
-
Global Constraints on Top Quark Anomalous Couplings
Authors:
Frédéric Déliot,
Ricardo Faria,
Miguel C. N. Fiolhais,
Pedro Lagarelhos,
António Onofre,
Christopher M. Pease,
Ana Vasconcelos
Abstract:
The latest results on top quark physics, namely single top quark production cross sections, $W$-boson helicity and asymmetry measurements are used to probe the Lorentz structure of the $Wtb$ vertex. The increase of sensitivity to new anomalous physics contributions to the top quark sector of the Standard Model is quantified by combining the relevant results from Tevatron and the Large Hadron Colli…
▽ More
The latest results on top quark physics, namely single top quark production cross sections, $W$-boson helicity and asymmetry measurements are used to probe the Lorentz structure of the $Wtb$ vertex. The increase of sensitivity to new anomalous physics contributions to the top quark sector of the Standard Model is quantified by combining the relevant results from Tevatron and the Large Hadron Collider. The results show that combining an increasing set of available precision measurements in the search for new physics phenomena beyond the Standard Model leads to significant sensitivity improvements, especially when compared with the current expectation for the High Luminosity run at the LHC.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
Equivalent Circuit for Magnetoelectric Read and Write Operations
Authors:
Kerem Y. Camsari,
Rafatul Faria,
Orchi Hassan,
Brian M. Sutton,
Supriyo Datta
Abstract:
We describe an equivalent circuit model applicable to a wide variety of magnetoelectric phenomena and use SPICE simulations to benchmark this model against experimental data. We use this model to suggest a different mode of operation where the "1" and "0'" states are not represented by states with net magnetization (like $m_x$, $m_y$ or $m_z$) but by different easy axes, quantitatively described b…
▽ More
We describe an equivalent circuit model applicable to a wide variety of magnetoelectric phenomena and use SPICE simulations to benchmark this model against experimental data. We use this model to suggest a different mode of operation where the "1" and "0'" states are not represented by states with net magnetization (like $m_x$, $m_y$ or $m_z$) but by different easy axes, quantitatively described by ($m_x^2 - m_y^2$) which switches from "0" to "1" through the write voltage. This change is directly detected as a read signal through the inverse effect. The use of ($m_x^2 - m_y^2$) to represent a bit is a radical departure from the standard convention of using the magnetization ($m$) to represent information. We then show how the equivalent circuit can be used to build a device exhibiting tunable randomness and suggest possibilities for extending it to non-volatile memory with read and write capabilities, without the use of external magnetic fields or magnetic tunnel junctions.
△ Less
Submitted 16 April, 2018; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Low Barrier Nanomagnets as p-bits for Spin Logic
Authors:
Rafatul Faria,
Kerem Yunus Camsari,
Supriyo Datta
Abstract:
It has recently been shown that a suitably interconnected network of tunable telegraphic noise generators or "p-bits" can be used to perform even precise arithmetic functions like a 32-bit adder. In this paper we use simulations based on the stochastic Landau-Lifshitz-Gilbert (sLLG) equation to demonstrate that similar impressive functions can be performed using unstable nanomagnets with energy ba…
▽ More
It has recently been shown that a suitably interconnected network of tunable telegraphic noise generators or "p-bits" can be used to perform even precise arithmetic functions like a 32-bit adder. In this paper we use simulations based on the stochastic Landau-Lifshitz-Gilbert (sLLG) equation to demonstrate that similar impressive functions can be performed using unstable nanomagnets with energy barriers as low as a fraction of a kT. This is surprising since the magnetization of low barrier nanomagnets is not telegraphic with discrete values of +1 and -1. Rather it fluctuates randomly among all values between -1 and +1, and the output magnets are read with a thresholding device that translates all positive values to 1 and all negative values to zero. We present sLLG-based simulations demonstrating the operation of a 32-bit adder with a network of several hundred nanomagnets, exhibiting a remarkably precise correlation: The input magnets {A} and {B} as well as the output magnets {S} all fluctuate randomly and yet the quantity A+B-S is sharply peaked around zero! If we fix {A} and {B}, the sum magnets {S} rapidly converge to a unique state with S=A+B so that the system acts as an adder. But unlike standard adders, the operation is invertible. If we fix {S} and {B}, the remaining magnets {A} converge to the difference A=S-B. These examples suggest a new direction for the field of nanomagnetics away from stable high barrier magnets towards stochastic low barrier magnets which not only operate with lower currents, but are also more promising for continued downscaling.
△ Less
Submitted 11 April, 2017; v1 submitted 16 November, 2016;
originally announced November 2016.
-
Stochastic p-bits for Invertible Logic
Authors:
Kerem Yunus Camsari,
Rafatul Faria,
Brian M. Sutton,
Supriyo Datta
Abstract:
Conventional logic and memory devices are built out of deterministic units such as transistors, or magnets with energy barriers in excess of 40-60 kT. We show that stochastic units, p-bits, can be interconnected to create robust correlations that implement Boolean functions with impressive accuracy, comparable to standard circuits. Also they are invertible, a unique property that is absent in digi…
▽ More
Conventional logic and memory devices are built out of deterministic units such as transistors, or magnets with energy barriers in excess of 40-60 kT. We show that stochastic units, p-bits, can be interconnected to create robust correlations that implement Boolean functions with impressive accuracy, comparable to standard circuits. Also they are invertible, a unique property that is absent in digital circuits. When operated in the direct mode, the input is clamped, and the network provides the correct output. In the inverted mode, the output is clamped, and the network fluctuates among possible inputs consistent with that output. We present an implementation of an invertible gate to bring out the key role of a three-terminal building block to enable the construction of correlated p-bit networks. The results for this implementation agree well with those from a universal model, showing that p-bits need not be magnet-based: any three-terminal tunable random bit generator should be suitable. We present an algorithm for designing a Boltzmann machine (BM) with symmetric connections that implements a given truth table. We then show how BM Full Adders can be interconnected in a partially directed manner to implement large operations such as 32-bit addition. Hundreds of p-bits get precisely correlated such that the correct answer out of 2^33 possibilities can be extracted by looking at the mode of a number of time samples. With perfect directivity a small number of samples is enough, while for less directed connections more samples are needed, but even in the former case invertibility is largely preserved. This combination of accuracy and invertibility is enabled by the hybrid design that uses bidirectional units to construct circuits with partially directed connections. We establish this result with examples including a 4-bit multiplier which in inverted mode functions as a factorizer.
△ Less
Submitted 21 July, 2017; v1 submitted 2 October, 2016;
originally announced October 2016.
-
Ultrafast Spin-Transfer-Torque Switching of Synthetic Ferrimagnets
Authors:
Kerem Yunus Camsari,
Ahmed Zeeshan Pervaiz,
Rafatul Faria,
Ernesto E. Marinero,
Supriyo Datta
Abstract:
The switching speed and the write current required for spin-transfer-torque reversal of spintronic devices such as magnetic tunnel junctions (MTJ) currently hinder their wide implementation into memory and logic devices. This problem is further exacerbated as the dimensions of MTJ nanostructures are scaled down to tens of nanometers in diameter, as higher magnetic anisotropy materials are required…
▽ More
The switching speed and the write current required for spin-transfer-torque reversal of spintronic devices such as magnetic tunnel junctions (MTJ) currently hinder their wide implementation into memory and logic devices. This problem is further exacerbated as the dimensions of MTJ nanostructures are scaled down to tens of nanometers in diameter, as higher magnetic anisotropy materials are required to meet thermal stability requirements that demand higher switching current densities. Here, we propose a simple solution to these issues based on synthetic ferrimagnet (SFM) structures. It is commonly assumed that to achieve a given switching delay, the current has to exceed the critical current by a certain factor and so a higher critical current implies a higher switching current. We show that this is not the case for SFM structures which can provide significantly reduced switching delay for a given current density, even though the critical current is increased. This non-intuitive result can be understood from the requirements of angular momentum conservation. We conclude that a 20 nm diameter MTJ incorporating the proposed SFM free layer structure can be switched in tens of picosecond time scales. This remarkable switching speed can be attained by employing current perpendicular magnetic anisotropy materials with experimentally demonstrated exchange coupling strengths.
△ Less
Submitted 14 June, 2016; v1 submitted 14 June, 2016;
originally announced June 2016.
-
Spatio-temporal Dynamics of Foot-and-Mouth Disease Virus in South America
Authors:
Luiz Max Carvalho,
Nuno Rodrigues Faria,
Andres M. Perez,
Marc A. Suchard,
Philippe Lemey,
Waldemir de Castro Silveira,
Andrew Rambaut,
Guy Baele
Abstract:
Although foot-and-mouth disease virus (FMDV) incidence has decreased in South America over the last years, the pathogen still circulates in the region and the risk of re-emergence in previously FMDV-free areas is a veterinary public health concern. In this paper we merge environmental, epidemiological and genetic data to reconstruct spatiotemporal patterns and determinants of FMDV serotypes A and…
▽ More
Although foot-and-mouth disease virus (FMDV) incidence has decreased in South America over the last years, the pathogen still circulates in the region and the risk of re-emergence in previously FMDV-free areas is a veterinary public health concern. In this paper we merge environmental, epidemiological and genetic data to reconstruct spatiotemporal patterns and determinants of FMDV serotypes A and O dispersal in South America. Our dating analysis suggests that serotype A emerged in South America around 1930, while serotype O emerged around 1990. The rate of evolution for serotype A was significantly higher compared to serotype O. Phylogeographic inference identified two well-connected sub networks of viral flow, one including Venezuela, Colombia and Ecuador; another including Brazil, Uruguay and Argentina. The spread of serotype A was best described by geographic distances, while trade of live cattle was the predictor that best explained serotype O spread. Our findings show that the two serotypes have different underlying evolutionary and spatial dynamics and may pose different threats to control programmes. Key-words: Phylogeography, foot-and-mouth disease virus, South America, animal trade.
△ Less
Submitted 1 June, 2015; v1 submitted 5 May, 2015;
originally announced May 2015.
-
The seasonal flight of influenza: a unified framework for spatiotemporal hypothesis testing
Authors:
Philippe Lemey,
Andrew Rambaut,
Trevor Bedford,
Nuno R. Faria,
Filip Bielejec,
Guy Baele,
Colin A. Russell,
Derek J. Smith,
Oliver G. Pybus,
Dirk Brockmann,
Marc A. Suchard
Abstract:
Global mobility flow data are at the heart of spatial epidemiological models used to predict infectious disease behavior but this wealth of data on human mobility has been largely neglected by reconstructions of pathogen evolutionary dynamics using viral genetic data. Although stochastic models of viral evolution may potentially be informed by such data, a major challenge lies in deciding which mo…
▽ More
Global mobility flow data are at the heart of spatial epidemiological models used to predict infectious disease behavior but this wealth of data on human mobility has been largely neglected by reconstructions of pathogen evolutionary dynamics using viral genetic data. Although stochastic models of viral evolution may potentially be informed by such data, a major challenge lies in deciding which mobility processes are critical and to what extent they contribute to sha** contemporaneous distributions of pathogen diversity. Here, we develop a framework to integrate predictors of viral diffusion with phylogeographic inference and estimate human influenza H3N2 migration history while simultaneously testing and quantifying the factors that underly it. We provide evidence for air travel governing the global dynamics of human influenza whereas other processes act at a more local scale.
△ Less
Submitted 22 October, 2012;
originally announced October 2012.
-
Ac transport studies in polymers by a resistor network and transfer matrix approaches: application to polyaniline
Authors:
H. N. Nagashima,
R. N. Onody,
R. M. Faria
Abstract:
A statistical model of resistor network is proposed to describe a polymer structure and to simulate the real and imaginary components of its ac resistivity. It takes into account the polydispersiveness of the material as well as intrachain and interchain charge transport processes. By the application of a transfer matrix technique, it reproduces ac resistivity measurements carried out with polya…
▽ More
A statistical model of resistor network is proposed to describe a polymer structure and to simulate the real and imaginary components of its ac resistivity. It takes into account the polydispersiveness of the material as well as intrachain and interchain charge transport processes. By the application of a transfer matrix technique, it reproduces ac resistivity measurements carried out with polyaniline films in different do** degrees and at different temperatures. Our results indicate that interchain processes govern the resistivity behavior in the low frequency region while, for higher frequencies, intrachain mechanisms are dominant.
△ Less
Submitted 24 November, 1998;
originally announced November 1998.