-
Investigating Mysteries of CoT-Augmented Distillation
Authors:
Somin Wadhwa,
Silvio Amir,
Byron C. Wallace
Abstract:
Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a s…
▽ More
Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a small student model yields (often substantial) improvements. In this work we ask: Why and how does this additional training signal help in model distillation? We perform ablations to interrogate this, and report some potentially surprising results. Specifically: (1) Placing CoT sequences after labels (rather than before) realizes consistently better downstream performance -- this means that no student "reasoning" is necessary at test time to realize gains. (2) When rationales are appended in this way, they need not be coherent reasoning sequences to yield improvements; performance increases are robust to permutations of CoT tokens, for example. In fact, (3) a small number of key tokens are sufficient to achieve improvements equivalent to those observed when full rationales are used in model distillation.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
On-the-fly Definition Augmentation of LLMs for Biomedical NER
Authors:
Monica Munnangi,
Sergey Feldman,
Byron C Wallace,
Silvio Amir,
Tom Hope,
Aakanksha Naik
Abstract:
Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to p…
▽ More
Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to provide a test bed for knowledge augmentation, we perform a comprehensive exploration of prompting strategies. Our experiments show that definition augmentation is useful for both open source and closed LLMs. For example, it leads to a relative improvement of 15\% (on average) in GPT-4 performance (F1) across all (six) of our test datasets. We conduct extensive ablations and analyses to demonstrate that our performance improvements stem from adding relevant definitional knowledge. We find that careful prompting strategies also improve LLM performance, allowing them to outperform fine-tuned language models in few-shot settings. To facilitate future research in this direction, we release our code at https://github.com/allenai/beacon.
△ Less
Submitted 23 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning
Authors:
Syed Muhammad Aamir,
Hongbin Ma,
Malak Abid Ali Khan,
Muhammad Aaqib
Abstract:
Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD…
▽ More
Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD and YOLO algorithms and improved precision of detection and reduce problems faced by these models. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. For training the developed model we apply the data augmentation technique to balance and diversify the data. We fine-tuned, trained, and evaluated these models on the established dataset by applying these techniques and highlighting the results we got more accurately than without applying these techniques. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4. Furthermore, by employing various techniques like data enhancement, noise reduction, parameter optimization, and model fusion we improve the effectiveness of detection and recognition. We further added a counting algorithm, and target attributes experimental comparison, and made a graphical user interface system for the developed model with features of object counting, alerts, status, resolution, and frame per second. Subsequently, to justify the importance of the developed method analysis of YOLO V3, V4, and SSD were incorporated. Which resulted in the overall completion of the proposed method.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Disentangling Structure and Appearance in ViT Feature Space
Authors:
Narek Tumanyan,
Omer Bar-Tal,
Shir Amir,
Shai Bagon,
Tali Dekel
Abstract:
We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. To integrate semantic information into our framework, our key idea is to leverage a pre-traine…
▽ More
We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. To integrate semantic information into our framework, our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model. Specifically, we derive novel disentangled representations of structure and appearance extracted from deep ViT features. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Based on our objective function, we propose two frameworks of semantic appearance transfer -- "Splice", which works by training a generator on a single and arbitrary pair of structure-appearance images, and "SpliceNet", a feed-forward real-time appearance transfer model trained on a dataset of images from a specific domain. Our frameworks do not involve adversarial training, nor do they require any additional input information such as semantic segmentation or correspondences. We demonstrate high-resolution results on a variety of in-the-wild image pairs, under significant variations in the number of objects, pose, and appearance. Code and supplementary material are available in our project page: splice-vit.github.io.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges
Authors:
Hiba Ahsan,
Denis Jered McInerney,
Jisoo Kim,
Christopher Potter,
Geoffrey Young,
Silvio Amir,
Byron C. Wallace
Abstract:
Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs…
▽ More
Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs as a mechanism to efficiently retrieve and summarize unstructured evidence in patient EHR relevant to a given query. Our method entails tasking an LLM to infer whether a patient has, or is at risk of, a particular condition on the basis of associated notes; if so, we ask the model to summarize the supporting evidence. Under expert evaluation, we find that this LLM-based approach provides outputs consistently preferred to a pre-LLM information retrieval baseline. Manual evaluation is expensive, so we also propose and validate a method using an LLM to evaluate (other) LLM outputs for this task, allowing us to scale up evaluation. Our findings indicate the promise of LLMs as interfaces to EHR, but also highlight the outstanding challenge posed by "hallucinations". In this setting, however, we show that model confidence in outputs strongly correlates with faithful summaries, offering a practical means to limit confabulations.
△ Less
Submitted 10 June, 2024; v1 submitted 8 September, 2023;
originally announced September 2023.
-
Revisiting Relation Extraction in the era of Large Language Models
Authors:
Somin Wadhwa,
Silvio Amir,
Byron C. Wallace
Abstract:
Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be…
▽ More
Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs
Authors:
Somin Wadhwa,
Jay DeYoung,
Benjamin Nye,
Silvio Amir,
Byron C. Wallace
Abstract:
Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of i…
▽ More
Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of interest from such articles. This onerous manual process has motivated work on (semi-)automating extraction of structured evidence from trial reports. In this work we propose and evaluate a text-to-text model built on instruction-tuned Large Language Models (LLMs) to jointly extract Interventions, Outcomes, and Comparators (ICO elements) from clinical abstracts, and infer the associated results reported. Manual (expert) and automated evaluations indicate that framing evidence extraction as a conditional generation task and fine-tuning LLMs for this purpose realizes considerable ($\sim$20 point absolute F1 score) gains over the previous SOTA. We perform ablations and error analyses to assess aspects that contribute to model performance, and to highlight potential directions for further improvements. We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings: http://ico-relations.ebm-nlp.com
△ Less
Submitted 17 July, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Object Recognition in Different Lighting Conditions at Various Angles by Deep Learning Method
Authors:
Imran Khan Mirani,
Chen Tianhua,
Malak Abid Ali Khan,
Syed Muhammad Aamir,
Waseef Menhaj
Abstract:
Existing computer vision and object detection methods strongly rely on neural networks and deep learning. This active research area is used for applications such as autonomous driving, aerial photography, protection, and monitoring. Futuristic object detection methods rely on rectangular, boundary boxes drawn over an object to accurately locate its location. The modern object recognition algorithm…
▽ More
Existing computer vision and object detection methods strongly rely on neural networks and deep learning. This active research area is used for applications such as autonomous driving, aerial photography, protection, and monitoring. Futuristic object detection methods rely on rectangular, boundary boxes drawn over an object to accurately locate its location. The modern object recognition algorithms, however, are vulnerable to multiple factors, such as illumination, occlusion, viewing angle, or camera rotation as well as cost. Therefore, deep learning-based object recognition will significantly increase the recognition speed and compatible external interference. In this study, we use convolutional neural networks (CNN) to recognize items, the neural networks have the advantages of end-to-end, sparse relation, and sharing weights. This article aims to classify the name of the various object based on the position of an object's detected box. Instead, under different distances, we can get recognition results with different confidence. Through this study, we find that this model's accuracy through recognition is mainly influenced by the proportion of objects and the number of samples. When we have a small proportion of an object on camera, then we get higher recognition accuracy; if we have a much small number of samples, we can get greater accuracy in recognition. The epidemic has a great impact on the world economy where designing a cheaper object recognition system is the need of time.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media
Authors:
Somin Wadhwa,
Vivek Khetan,
Silvio Amir,
Byron Wallace
Abstract:
We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Intervention…
▽ More
We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Interventions, and Outcomes (PIO elements) within these. Using this corpus, we introduce the task of retrieving trustworthy evidence relevant to a given claim made on social media. We propose a new method to automatically derive (noisy) supervision for this task which we use to train a dense retrieval model; this outperforms baseline models. Manual evaluation of retrieval results performed by medical doctors indicate that while our system performance is promising, there is considerable room for improvement. Collected annotations (and scripts to assemble the dataset), are available at https://github.com/sominw/redhot.
△ Less
Submitted 7 February, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Deep Learning Models for Automated Classification of Dog Emotional States from Facial Expressions
Authors:
Tali Boneh-Shitrit,
Shir Amir,
Annika Bremhorst,
Daniel S. Mills,
Stefanie Riemer,
Dror Fried,
Anna Zamansky
Abstract:
Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques…
▽ More
Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques to classify (positive) anticipation and (negative) frustration of dogs on a dataset collected in a controlled experimental setting. We explore the suitability of different backbones (e.g. ResNet, ViT) under different supervisions to this task, and find that features of a self-supervised pretrained ViT (DINO-ViT) are superior to the other alternatives. To the best of our knowledge, this work is the first to address the task of automatic classification of canine emotions on data acquired in a controlled experiment.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Deep ViT Features as Dense Visual Descriptors
Authors:
Shir Amir,
Yossi Gandelsman,
Shai Bagon,
Tali Dekel
Abstract:
We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically demonstrate that such features, when extractedfrom a self-supervised ViT model (DINO-ViT), exhibit several striking properties, including: (i) the features encode powerful, well-localized semantic information, at high spatial granularity, such as object par…
▽ More
We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically demonstrate that such features, when extractedfrom a self-supervised ViT model (DINO-ViT), exhibit several striking properties, including: (i) the features encode powerful, well-localized semantic information, at high spatial granularity, such as object parts; (ii) the encoded semantic information is shared across related, yet different object categories, and (iii) positional bias changes gradually throughout the layers. These properties allow us to design simple methods for a variety of applications, including co-segmentation, part co-segmentation and semantic correspondences. To distill the power of ViT features from convoluted design choices, we restrict ourselves to lightweight zero-shot methodologies (e.g., binning and clustering) applied directly to the features. Since our methods require no additional training nor data, they are readily applicable across a variety of domains. We show by extensive qualitative and quantitative evaluation that our simple methodologies achieve competitive results with recent state-of-the-art supervised methods, and outperform previous unsupervised methods by a large margin. Code is available in dino-vit-features.github.io.
△ Less
Submitted 15 October, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
On the Impact of Random Seeds on the Fairness of Clinical Classifiers
Authors:
Silvio Amir,
Jan-Willem van de Meent,
Byron C. Wallace
Abstract:
Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yie…
▽ More
Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement
Authors:
Aaron Mueller,
Zach Wood-Doughty,
Silvio Amir,
Mark Dredze,
Alicia L. Nobles
Abstract:
The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements…
▽ More
The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements. Through an analysis of over 600,000 tweets from over 256,000 unique users, we examine online #MeToo conversations across gender and racial/ethnic identities and the topics that each demographic emphasized. We found that tweets authored by white women were overrepresented in the movement compared to other demographics, aligning with criticism of unequal representation. We found that intersected identities contributed differing narratives to frame the movement, co-opted the movement to raise visibility in parallel ongoing movements, employed the same hashtags both critically and supportively, and revived and created new hashtags in response to pivotal moments. Notably, tweets authored by black women often expressed emotional support and were critical about differential treatment in the justice system and by police. In comparison, tweets authored by white women and men often highlighted sexual harassment and violence by public figures and weaved in more general political discussions. We discuss the implications of work for digital activism research and design including suggestions to raise visibility by those who were under-represented in this hashtag activism movement. Content warning: this article discusses issues of sexual harassment and violence.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
Demonstrating Advantages of Neuromorphic Computation: A Pilot Study
Authors:
Timo Wunderlich,
Akos F. Kungl,
Eric Müller,
Andreas Hartel,
Yannik Stradmann,
Syed Ahmed Aamir,
Andreas Grübl,
Arthur Heimbrecht,
Korbinian Schreiber,
David Stöckel,
Christian Pehle,
Sebastian Billaudelle,
Gerd Kiene,
Christian Mauch,
Johannes Schemmel,
Karlheinz Meier,
Mihai A. Petrovici
Abstract:
Neuromorphic devices represent an attempt to mimic aspects of the brain's architecture and dynamics with the aim of replicating its hallmark functional capabilities in terms of computational power, robust learning and energy efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic system to implement a proof-of-concept demonstration of reward-modulated spike-timing-dependent…
▽ More
Neuromorphic devices represent an attempt to mimic aspects of the brain's architecture and dynamics with the aim of replicating its hallmark functional capabilities in terms of computational power, robust learning and energy efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic system to implement a proof-of-concept demonstration of reward-modulated spike-timing-dependent plasticity in a spiking network that learns to play the Pong video game by smooth pursuit. This system combines an electronic mixed-signal substrate for emulating neuron and synapse dynamics with an embedded digital processor for on-chip learning, which in this work also serves to simulate the virtual environment and learning agent. The analog emulation of neuronal membrane dynamics enables a 1000-fold acceleration with respect to biological real-time, with the entire chip operating on a power budget of 57mW. Compared to an equivalent simulation using state-of-the-art software, the on-chip emulation is at least one order of magnitude faster and three orders of magnitude more energy-efficient. We demonstrate how on-chip learning can mitigate the effects of fixed-pattern noise, which is unavoidable in analog substrates, while making use of temporal variability for action exploration. Learning compensates imperfections of the physical substrate, as manifested in neuronal parameter variability, by adapting synaptic weights to match respective excitability of individual neurons.
△ Less
Submitted 8 March, 2019; v1 submitted 8 November, 2018;
originally announced November 2018.
-
An Accelerated LIF Neuronal Network Array for a Large Scale Mixed-Signal Neuromorphic Architecture
Authors:
Syed Ahmed Aamir,
Yannik Stradmann,
Paul Müller,
Christian Pehle,
Andreas Hartel,
Andreas Grübl,
Johannes Schemmel,
Karlheinz Meier
Abstract:
We present an array of leaky integrate-and-fire (LIF) neuron circuits designed for the second-generation BrainScaleS mixed-signal 65-nm CMOS neuromorphic hardware. The neuronal array is embedded in the analog network core of a scaled-down prototype HICANN-DLS chip. Designed as continuous-time circuits, the neurons are highly tunable and reconfigurable elements with accelerated dynamics. Each neuro…
▽ More
We present an array of leaky integrate-and-fire (LIF) neuron circuits designed for the second-generation BrainScaleS mixed-signal 65-nm CMOS neuromorphic hardware. The neuronal array is embedded in the analog network core of a scaled-down prototype HICANN-DLS chip. Designed as continuous-time circuits, the neurons are highly tunable and reconfigurable elements with accelerated dynamics. Each neuron integrates input current from a multitude of incoming synapses and evokes a digital spike event output. The circuit offers a wide tuning range for synaptic and membrane time constants, as well as for refractory periods to cover a number of computational models. We elucidate our design methodology, underlying circuit design, calibration and measurement results from individual sub-circuits across multiple dies. The circuit dynamics match with the behavior of the LIF mathematical model. We further demonstrate a winner-take-all network on the prototype chip as a typical element of cortical processing.
△ Less
Submitted 23 May, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
A Mixed-Signal Structured AdEx Neuron for Accelerated Neuromorphic Cores
Authors:
Syed Ahmed Aamir,
Paul Müller,
Gerd Kiene,
Laura Kriener,
Yannik Stradmann,
Andreas Grübl,
Johannes Schemmel,
Karlheinz Meier
Abstract:
Here we describe a multi-compartment neuron circuit based on the Adaptive-Exponential I&F (AdEx) model, developed for the second-generation BrainScaleS hardware. Based on an existing modular Leaky Integrate-and-Fire (LIF) architecture designed in 65 nm CMOS, the circuit features exponential spike generation, neuronal adaptation, inter-compartmental connections as well as a conductance-based reset.…
▽ More
Here we describe a multi-compartment neuron circuit based on the Adaptive-Exponential I&F (AdEx) model, developed for the second-generation BrainScaleS hardware. Based on an existing modular Leaky Integrate-and-Fire (LIF) architecture designed in 65 nm CMOS, the circuit features exponential spike generation, neuronal adaptation, inter-compartmental connections as well as a conductance-based reset. The design reproduces a diverse set of firing patterns observed in cortical pyramidal neurons. Further, it enables the emulation of sodium and calcium spikes, as well as N-Methyl-D-Aspartate (NMDA) plateau potentials known from apical and thin dendrites. We characterize the AdEx circuit extensions and exemplify how the interplay between passive and non-linear active signal processing enhances the computational capabilities of single (but structured) on-chip neurons.
△ Less
Submitted 29 May, 2018; v1 submitted 5 April, 2018;
originally announced April 2018.
-
Quantifying Mental Health from Social Media with Neural User Embeddings
Authors:
Silvio Amir,
Glen Coppersmith,
Paula Carvalho,
Mário J. Silva,
Byron C. Wallace
Abstract:
Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement thes…
▽ More
Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions.
We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control' users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health.
△ Less
Submitted 30 April, 2017;
originally announced May 2017.
-
Expanding Subjective Lexicons for Social Media Mining with Embedding Subspaces
Authors:
Silvio Amir,
Rámon Astudillo,
Wang Ling,
Paula C. Carvalho,
Mário J. Silva
Abstract:
Recent approaches for sentiment lexicon induction have capitalized on pre-trained word embeddings that capture latent semantic properties. However, embeddings obtained by optimizing performance of a given task (e.g. predicting contextual words) are sub-optimal for other applications. In this paper, we address this problem by exploiting task-specific representations, induced via embedding sub-space…
▽ More
Recent approaches for sentiment lexicon induction have capitalized on pre-trained word embeddings that capture latent semantic properties. However, embeddings obtained by optimizing performance of a given task (e.g. predicting contextual words) are sub-optimal for other applications. In this paper, we address this problem by exploiting task-specific representations, induced via embedding sub-space projection. This allows us to expand lexicons describing multiple semantic properties. For each property, our model jointly learns suitable representations and the concomitant predictor. Experiments conducted over multiple subjective lexicons, show that our model outperforms previous work and other baselines; even in low training data regimes. Furthermore, lexicon-based sentiment classifiers built on top of our lexicons outperform similar resources and yield performances comparable to those of supervised models.
△ Less
Submitted 6 January, 2017; v1 submitted 31 December, 2016;
originally announced January 2017.
-
Modelling Context with User Embeddings for Sarcasm Detection in Social Media
Authors:
Silvio Amir,
Byron C. Wallace,
Hao Lyu,
Paula Carvalho Mário J. Silva
Abstract:
We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have ac…
▽ More
We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scra**); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.
△ Less
Submitted 4 July, 2016; v1 submitted 4 July, 2016;
originally announced July 2016.
-
POPmine: Tracking Political Opinion on the Web
Authors:
Pedro Saleiro,
Sílvio Amir,
Mário J. Silva,
Carlos Soares
Abstract:
The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social me…
▽ More
The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social media (blogs and Twitter) and to process those texts, recognizing topics and political actors, analyzing relevant linguistic units, and generating indicators of both frequency of mention and polarity (positivity/negativity) of mentions to political actors across sources, types of sources, and across time.
△ Less
Submitted 29 November, 2015;
originally announced November 2015.
-
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation
Authors:
Wang Ling,
Tiago Luís,
Luís Marujo,
Ramón Fernandez Astudillo,
Silvio Amir,
Chris Dyer,
Alan W. Black,
Isabel Trancoso
Abstract:
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly,…
▽ More
We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).
△ Less
Submitted 23 May, 2016; v1 submitted 9 August, 2015;
originally announced August 2015.