Search | arXiv e-print repository

Investigating Mysteries of CoT-Augmented Distillation

Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

Abstract: Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a s… ▽ More Eliciting "chain of thought" (CoT) rationales -- sequences of token that convey a "reasoning" process -- has been shown to consistently improve LLM performance on tasks like question answering. More recent efforts have shown that such rationales can also be used for model distillation: Including CoT sequences (elicited from a large "teacher" model) in addition to target labels when fine-tuning a small student model yields (often substantial) improvements. In this work we ask: Why and how does this additional training signal help in model distillation? We perform ablations to interrogate this, and report some potentially surprising results. Specifically: (1) Placing CoT sequences after labels (rather than before) realizes consistently better downstream performance -- this means that no student "reasoning" is necessary at test time to realize gains. (2) When rationales are appended in this way, they need not be coherent reasoning sequences to yield improvements; performance increases are robust to permutations of CoT tokens, for example. In fact, (3) a small number of key tokens are sufficient to achieve improvements equivalent to those observed when full rationales are used in model distillation. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Draft; under review

arXiv:2404.00152 [pdf, other]

On-the-fly Definition Augmentation of LLMs for Biomedical NER

Authors: Monica Munnangi, Sergey Feldman, Byron C Wallace, Silvio Amir, Tom Hope, Aakanksha Naik

Abstract: Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to p… ▽ More Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to provide a test bed for knowledge augmentation, we perform a comprehensive exploration of prompting strategies. Our experiments show that definition augmentation is useful for both open source and closed LLMs. For example, it leads to a relative improvement of 15\% (on average) in GPT-4 performance (F1) across all (six) of our test datasets. We conduct extensive ablations and analyses to demonstrate that our performance improvements stem from adding relevant definitional knowledge. We find that careful prompting strategies also improve LLM performance, allowing them to outperform fine-tuned language models in few-shot settings. To facilitate future research in this direction, we release our code at https://github.com/allenai/beacon. △ Less

Submitted 23 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

Comments: To appear at NAACL 2024 (Main)

arXiv:2401.00986 [pdf]

Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning

Authors: Syed Muhammad Aamir, Hongbin Ma, Malak Abid Ali Khan, Muhammad Aaqib

Abstract: Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD… ▽ More Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD and YOLO algorithms and improved precision of detection and reduce problems faced by these models. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. For training the developed model we apply the data augmentation technique to balance and diversify the data. We fine-tuned, trained, and evaluated these models on the established dataset by applying these techniques and highlighting the results we got more accurately than without applying these techniques. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4. Furthermore, by employing various techniques like data enhancement, noise reduction, parameter optimization, and model fusion we improve the effectiveness of detection and recognition. We further added a counting algorithm, and target attributes experimental comparison, and made a graphical user interface system for the developed model with features of object counting, alerts, status, resolution, and frame per second. Subsequently, to justify the importance of the developed method analysis of YOLO V3, V4, and SSD were incorporated. Which resulted in the overall completion of the proposed method. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2311.12193 [pdf, other]

doi 10.1145/3630096

Disentangling Structure and Appearance in ViT Feature Space

Authors: Narek Tumanyan, Omer Bar-Tal, Shir Amir, Shai Bagon, Tali Dekel

Abstract: We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. To integrate semantic information into our framework, our key idea is to leverage a pre-traine… ▽ More We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. To integrate semantic information into our framework, our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model. Specifically, we derive novel disentangled representations of structure and appearance extracted from deep ViT features. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Based on our objective function, we propose two frameworks of semantic appearance transfer -- "Splice", which works by training a generator on a single and arbitrary pair of structure-appearance images, and "SpliceNet", a feed-forward real-time appearance transfer model trained on a dataset of images from a specific domain. Our frameworks do not involve adversarial training, nor do they require any additional input information such as semantic segmentation or correspondences. We demonstrate high-resolution results on a variety of in-the-wild image pairs, under significant variations in the number of objects, pose, and appearance. Code and supplementary material are available in our project page: splice-vit.github.io. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted to ACM Transactions on Graphics. arXiv admin note: substantial text overlap with arXiv:2201.00424

arXiv:2309.04550 [pdf, other]

Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges

Authors: Hiba Ahsan, Denis Jered McInerney, Jisoo Kim, Christopher Potter, Geoffrey Young, Silvio Amir, Byron C. Wallace

Abstract: Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs… ▽ More Unstructured data in Electronic Health Records (EHRs) often contains critical information -- complementary to imaging -- that could inform radiologists' diagnoses. But the large volume of notes often associated with patients together with time constraints renders manually identifying relevant evidence practically infeasible. In this work we propose and evaluate a zero-shot strategy for using LLMs as a mechanism to efficiently retrieve and summarize unstructured evidence in patient EHR relevant to a given query. Our method entails tasking an LLM to infer whether a patient has, or is at risk of, a particular condition on the basis of associated notes; if so, we ask the model to summarize the supporting evidence. Under expert evaluation, we find that this LLM-based approach provides outputs consistently preferred to a pre-LLM information retrieval baseline. Manual evaluation is expensive, so we also propose and validate a method using an LLM to evaluate (other) LLM outputs for this task, allowing us to scale up evaluation. Our findings indicate the promise of LLMs as interfaces to EHR, but also highlight the outstanding challenge posed by "hallucinations". In this setting, however, we show that model confidence in outputs strongly correlates with faithful summaries, offering a practical means to limit confabulations. △ Less

Submitted 10 June, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2305.05003 [pdf, other]

Revisiting Relation Extraction in the era of Large Language Models

Authors: Somin Wadhwa, Silvio Amir, Byron C. Wallace

Abstract: Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be… ▽ More Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a \emph{sequence-to-sequence} task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.03642 [pdf, other]

Jointly Extracting Interventions, Outcomes, and Findings from RCT Reports with LLMs

Authors: Somin Wadhwa, Jay DeYoung, Benjamin Nye, Silvio Amir, Byron C. Wallace

Abstract: Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of i… ▽ More Results from Randomized Controlled Trials (RCTs) establish the comparative effectiveness of interventions, and are in turn critical inputs for evidence-based care. However, results from RCTs are presented in (often unstructured) natural language articles describing the design, execution, and outcomes of trials; clinicians must manually extract findings pertaining to interventions and outcomes of interest from such articles. This onerous manual process has motivated work on (semi-)automating extraction of structured evidence from trial reports. In this work we propose and evaluate a text-to-text model built on instruction-tuned Large Language Models (LLMs) to jointly extract Interventions, Outcomes, and Comparators (ICO elements) from clinical abstracts, and infer the associated results reported. Manual (expert) and automated evaluations indicate that framing evidence extraction as a conditional generation task and fine-tuning LLMs for this purpose realizes considerable ($\sim$20 point absolute F1 score) gains over the previous SOTA. We perform ablations and error analyses to assess aspects that contribute to model performance, and to highlight potential directions for further improvements. We apply our model to a collection of published RCTs through mid-2022, and release a searchable database of structured findings: http://ico-relations.ebm-nlp.com △ Less

Submitted 17 July, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: Accepted to MLHC 2023

arXiv:2210.09618 [pdf]

Object Recognition in Different Lighting Conditions at Various Angles by Deep Learning Method

Authors: Imran Khan Mirani, Chen Tianhua, Malak Abid Ali Khan, Syed Muhammad Aamir, Waseef Menhaj

Abstract: Existing computer vision and object detection methods strongly rely on neural networks and deep learning. This active research area is used for applications such as autonomous driving, aerial photography, protection, and monitoring. Futuristic object detection methods rely on rectangular, boundary boxes drawn over an object to accurately locate its location. The modern object recognition algorithm… ▽ More Existing computer vision and object detection methods strongly rely on neural networks and deep learning. This active research area is used for applications such as autonomous driving, aerial photography, protection, and monitoring. Futuristic object detection methods rely on rectangular, boundary boxes drawn over an object to accurately locate its location. The modern object recognition algorithms, however, are vulnerable to multiple factors, such as illumination, occlusion, viewing angle, or camera rotation as well as cost. Therefore, deep learning-based object recognition will significantly increase the recognition speed and compatible external interference. In this study, we use convolutional neural networks (CNN) to recognize items, the neural networks have the advantages of end-to-end, sparse relation, and sharing weights. This article aims to classify the name of the various object based on the position of an object's detected box. Instead, under different distances, we can get recognition results with different confidence. Through this study, we find that this model's accuracy through recognition is mainly influenced by the proportion of objects and the number of samples. When we have a small proportion of an object on camera, then we get higher recognition accuracy; if we have a much small number of samples, we can get greater accuracy in recognition. The epidemic has a great impact on the world economy where designing a cheaper object recognition system is the need of time. △ Less

Submitted 18 October, 2022; originally announced October 2022.

arXiv:2210.06331 [pdf, other]

RedHOT: A Corpus of Annotated Medical Questions, Experiences, and Claims on Social Media

Authors: Somin Wadhwa, Vivek Khetan, Silvio Amir, Byron Wallace

Abstract: We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Intervention… ▽ More We present Reddit Health Online Talk (RedHOT), a corpus of 22,000 richly annotated social media posts from Reddit spanning 24 health conditions. Annotations include demarcations of spans corresponding to medical claims, personal experiences, and questions. We collect additional granular annotations on identified claims. Specifically, we mark snippets that describe patient Populations, Interventions, and Outcomes (PIO elements) within these. Using this corpus, we introduce the task of retrieving trustworthy evidence relevant to a given claim made on social media. We propose a new method to automatically derive (noisy) supervision for this task which we use to train a dense retrieval model; this outperforms baseline models. Manual evaluation of retrieval results performed by medical doctors indicate that while our system performance is promising, there is considerable room for improvement. Collected annotations (and scripts to assemble the dataset), are available at https://github.com/sominw/redhot. △ Less

Submitted 7 February, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to EACL 2023

arXiv:2206.05619 [pdf, other]

Deep Learning Models for Automated Classification of Dog Emotional States from Facial Expressions

Authors: Tali Boneh-Shitrit, Shir Amir, Annika Bremhorst, Daniel S. Mills, Stefanie Riemer, Dror Fried, Anna Zamansky

Abstract: Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques… ▽ More Similarly to humans, facial expressions in animals are closely linked with emotional states. However, in contrast to the human domain, automated recognition of emotional states from facial expressions in animals is underexplored, mainly due to difficulties in data collection and establishment of ground truth concerning emotional states of non-verbal users. We apply recent deep learning techniques to classify (positive) anticipation and (negative) frustration of dogs on a dataset collected in a controlled experimental setting. We explore the suitability of different backbones (e.g. ResNet, ViT) under different supervisions to this task, and find that features of a self-supervised pretrained ViT (DINO-ViT) are superior to the other alternatives. To the best of our knowledge, this work is the first to address the task of automatic classification of canine emotions on data acquired in a controlled experiment. △ Less

Submitted 11 June, 2022; originally announced June 2022.

arXiv:2112.05814 [pdf, other]

Deep ViT Features as Dense Visual Descriptors

Authors: Shir Amir, Yossi Gandelsman, Shai Bagon, Tali Dekel

Abstract: We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically demonstrate that such features, when extractedfrom a self-supervised ViT model (DINO-ViT), exhibit several striking properties, including: (i) the features encode powerful, well-localized semantic information, at high spatial granularity, such as object par… ▽ More We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as dense visual descriptors. We observe and empirically demonstrate that such features, when extractedfrom a self-supervised ViT model (DINO-ViT), exhibit several striking properties, including: (i) the features encode powerful, well-localized semantic information, at high spatial granularity, such as object parts; (ii) the encoded semantic information is shared across related, yet different object categories, and (iii) positional bias changes gradually throughout the layers. These properties allow us to design simple methods for a variety of applications, including co-segmentation, part co-segmentation and semantic correspondences. To distill the power of ViT features from convoluted design choices, we restrict ourselves to lightweight zero-shot methodologies (e.g., binning and clustering) applied directly to the features. Since our methods require no additional training nor data, they are readily applicable across a variety of domains. We show by extensive qualitative and quantitative evaluation that our simple methodologies achieve competitive results with recent state-of-the-art supervised methods, and outperform previous unsupervised methods by a large margin. Code is available in dino-vit-features.github.io. △ Less

Submitted 15 October, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

Comments: Revised version - high res figures

arXiv:2104.06338 [pdf, other]

On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Authors: Silvio Amir, Jan-Willem van de Meent, Byron C. Wallace

Abstract: Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yie… ▽ More Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes. △ Less

Submitted 13 April, 2021; originally announced April 2021.

Comments: Accepted for publication at NAACL 2021

arXiv:2010.06472 [pdf, other]

Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement

Authors: Aaron Mueller, Zach Wood-Doughty, Silvio Amir, Mark Dredze, Alicia L. Nobles

Abstract: The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements… ▽ More The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence. While #MeToo has been praised for providing support for self-disclosures of harassment or violence and shifting societal response, it has also been criticized for exemplifying how women of color have been discounted for their historical contributions to and excluded from feminist movements. Through an analysis of over 600,000 tweets from over 256,000 unique users, we examine online #MeToo conversations across gender and racial/ethnic identities and the topics that each demographic emphasized. We found that tweets authored by white women were overrepresented in the movement compared to other demographics, aligning with criticism of unequal representation. We found that intersected identities contributed differing narratives to frame the movement, co-opted the movement to raise visibility in parallel ongoing movements, employed the same hashtags both critically and supportively, and revived and created new hashtags in response to pivotal moments. Notably, tweets authored by black women often expressed emotional support and were critical about differential treatment in the justice system and by police. In comparison, tweets authored by white women and men often highlighted sexual harassment and violence by public figures and weaved in more general political discussions. We discuss the implications of work for digital activism research and design including suggestions to raise visibility by those who were under-represented in this hashtag activism movement. Content warning: this article discusses issues of sexual harassment and violence. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: 27 pages (incl. 5 for references). Submitted to CSCW 2021

arXiv:1811.03618 [pdf, other]

doi 10.3389/fnins.2019.00260

Demonstrating Advantages of Neuromorphic Computation: A Pilot Study

Authors: Timo Wunderlich, Akos F. Kungl, Eric Müller, Andreas Hartel, Yannik Stradmann, Syed Ahmed Aamir, Andreas Grübl, Arthur Heimbrecht, Korbinian Schreiber, David Stöckel, Christian Pehle, Sebastian Billaudelle, Gerd Kiene, Christian Mauch, Johannes Schemmel, Karlheinz Meier, Mihai A. Petrovici

Abstract: Neuromorphic devices represent an attempt to mimic aspects of the brain's architecture and dynamics with the aim of replicating its hallmark functional capabilities in terms of computational power, robust learning and energy efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic system to implement a proof-of-concept demonstration of reward-modulated spike-timing-dependent… ▽ More Neuromorphic devices represent an attempt to mimic aspects of the brain's architecture and dynamics with the aim of replicating its hallmark functional capabilities in terms of computational power, robust learning and energy efficiency. We employ a single-chip prototype of the BrainScaleS 2 neuromorphic system to implement a proof-of-concept demonstration of reward-modulated spike-timing-dependent plasticity in a spiking network that learns to play the Pong video game by smooth pursuit. This system combines an electronic mixed-signal substrate for emulating neuron and synapse dynamics with an embedded digital processor for on-chip learning, which in this work also serves to simulate the virtual environment and learning agent. The analog emulation of neuronal membrane dynamics enables a 1000-fold acceleration with respect to biological real-time, with the entire chip operating on a power budget of 57mW. Compared to an equivalent simulation using state-of-the-art software, the on-chip emulation is at least one order of magnitude faster and three orders of magnitude more energy-efficient. We demonstrate how on-chip learning can mitigate the effects of fixed-pattern noise, which is unavoidable in analog substrates, while making use of temporal variability for action exploration. Learning compensates imperfections of the physical substrate, as manifested in neuronal parameter variability, by adapting synaptic weights to match respective excitability of individual neurons. △ Less

Submitted 8 March, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: Added measurements with noise in NEST simulation, add notice about journal publication. Frontiers in Neuromorphic Engineering (2019)

arXiv:1804.01906 [pdf, other]

doi 10.1109/TCSI.2018.2840718

An Accelerated LIF Neuronal Network Array for a Large Scale Mixed-Signal Neuromorphic Architecture

Authors: Syed Ahmed Aamir, Yannik Stradmann, Paul Müller, Christian Pehle, Andreas Hartel, Andreas Grübl, Johannes Schemmel, Karlheinz Meier

Abstract: We present an array of leaky integrate-and-fire (LIF) neuron circuits designed for the second-generation BrainScaleS mixed-signal 65-nm CMOS neuromorphic hardware. The neuronal array is embedded in the analog network core of a scaled-down prototype HICANN-DLS chip. Designed as continuous-time circuits, the neurons are highly tunable and reconfigurable elements with accelerated dynamics. Each neuro… ▽ More We present an array of leaky integrate-and-fire (LIF) neuron circuits designed for the second-generation BrainScaleS mixed-signal 65-nm CMOS neuromorphic hardware. The neuronal array is embedded in the analog network core of a scaled-down prototype HICANN-DLS chip. Designed as continuous-time circuits, the neurons are highly tunable and reconfigurable elements with accelerated dynamics. Each neuron integrates input current from a multitude of incoming synapses and evokes a digital spike event output. The circuit offers a wide tuning range for synaptic and membrane time constants, as well as for refractory periods to cover a number of computational models. We elucidate our design methodology, underlying circuit design, calibration and measurement results from individual sub-circuits across multiple dies. The circuit dynamics match with the behavior of the LIF mathematical model. We further demonstrate a winner-take-all network on the prototype chip as a typical element of cortical processing. △ Less

Submitted 23 May, 2018; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 14 pages, 9 Figures, accepted for publication in IEEE Transactions on Circuits and Systems I

arXiv:1804.01840 [pdf, other]

doi 10.1109/TBCAS.2018.2848203

A Mixed-Signal Structured AdEx Neuron for Accelerated Neuromorphic Cores

Authors: Syed Ahmed Aamir, Paul Müller, Gerd Kiene, Laura Kriener, Yannik Stradmann, Andreas Grübl, Johannes Schemmel, Karlheinz Meier

Abstract: Here we describe a multi-compartment neuron circuit based on the Adaptive-Exponential I&F (AdEx) model, developed for the second-generation BrainScaleS hardware. Based on an existing modular Leaky Integrate-and-Fire (LIF) architecture designed in 65 nm CMOS, the circuit features exponential spike generation, neuronal adaptation, inter-compartmental connections as well as a conductance-based reset.… ▽ More Here we describe a multi-compartment neuron circuit based on the Adaptive-Exponential I&F (AdEx) model, developed for the second-generation BrainScaleS hardware. Based on an existing modular Leaky Integrate-and-Fire (LIF) architecture designed in 65 nm CMOS, the circuit features exponential spike generation, neuronal adaptation, inter-compartmental connections as well as a conductance-based reset. The design reproduces a diverse set of firing patterns observed in cortical pyramidal neurons. Further, it enables the emulation of sodium and calcium spikes, as well as N-Methyl-D-Aspartate (NMDA) plateau potentials known from apical and thin dendrites. We characterize the AdEx circuit extensions and exemplify how the interplay between passive and non-linear active signal processing enhances the computational capabilities of single (but structured) on-chip neurons. △ Less

Submitted 29 May, 2018; v1 submitted 5 April, 2018; originally announced April 2018.

Comments: 11 pages, 17 figures (including author photographs)

arXiv:1705.00335 [pdf, other]

Quantifying Mental Health from Social Media with Neural User Embeddings

Authors: Silvio Amir, Glen Coppersmith, Paula Carvalho, Mário J. Silva, Byron C. Wallace

Abstract: Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement thes… ▽ More Mental illnesses adversely affect a significant proportion of the population worldwide. However, the methods traditionally used for estimating and characterizing the prevalence of mental health conditions are time-consuming and expensive. Consequently, best-available estimates concerning the prevalence of mental health conditions are often years out of date. Automated approaches to supplement these survey methods with broad, aggregated information derived from social media content provides a potential means for near real-time estimates at scale. These may, in turn, provide grist for supporting, evaluating and iteratively improving upon public health programs and interventions. We propose a novel model for automated mental health status quantification that incorporates user embeddings. This builds upon recent work exploring representation learning methods that induce embeddings by leveraging social media post histories. Such embeddings capture latent characteristics of individuals (e.g., political leanings) and encode a soft notion of homophily. In this paper, we investigate whether user embeddings learned from twitter post histories encode information that correlates with mental health statuses. To this end, we estimated user embeddings for a set of users known to be affected by depression and post-traumatic stress disorder (PTSD), and for a set of demographically matched `control' users. We then evaluated these embeddings with respect to: (i) their ability to capture homophilic relations with respect to mental health status; and (ii) the performance of downstream mental health prediction models based on these features. Our experimental results demonstrate that the user embeddings capture similarities between users with respect to mental conditions, and are predictive of mental health. △ Less

Submitted 30 April, 2017; originally announced May 2017.

arXiv:1701.00145 [pdf, other]

Expanding Subjective Lexicons for Social Media Mining with Embedding Subspaces

Authors: Silvio Amir, Rámon Astudillo, Wang Ling, Paula C. Carvalho, Mário J. Silva

Abstract: Recent approaches for sentiment lexicon induction have capitalized on pre-trained word embeddings that capture latent semantic properties. However, embeddings obtained by optimizing performance of a given task (e.g. predicting contextual words) are sub-optimal for other applications. In this paper, we address this problem by exploiting task-specific representations, induced via embedding sub-space… ▽ More Recent approaches for sentiment lexicon induction have capitalized on pre-trained word embeddings that capture latent semantic properties. However, embeddings obtained by optimizing performance of a given task (e.g. predicting contextual words) are sub-optimal for other applications. In this paper, we address this problem by exploiting task-specific representations, induced via embedding sub-space projection. This allows us to expand lexicons describing multiple semantic properties. For each property, our model jointly learns suitable representations and the concomitant predictor. Experiments conducted over multiple subjective lexicons, show that our model outperforms previous work and other baselines; even in low training data regimes. Furthermore, lexicon-based sentiment classifiers built on top of our lexicons outperform similar resources and yield performances comparable to those of supervised models. △ Less

Submitted 6 January, 2017; v1 submitted 31 December, 2016; originally announced January 2017.

arXiv:1607.00976 [pdf, other]

Modelling Context with User Embeddings for Sarcasm Detection in Social Media

Authors: Silvio Amir, Byron C. Wallace, Hao Lyu, Paula Carvalho Mário J. Silva

Abstract: We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have ac… ▽ More We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scra**); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features. △ Less

Submitted 4 July, 2016; v1 submitted 4 July, 2016; originally announced July 2016.

Comments: published as a conference paper at CONLL 2016

arXiv:1511.09101 [pdf, other]

doi 10.1109/CIT/IUCC/DASC/PICOM.2015.228

POPmine: Tracking Political Opinion on the Web

Authors: Pedro Saleiro, Sílvio Amir, Mário J. Silva, Carlos Soares

Abstract: The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social me… ▽ More The automatic content analysis of mass media in the social sciences has become necessary and possible with the raise of social media and computational power. One particularly promising avenue of research concerns the use of opinion mining. We design and implement the POPmine system which is able to collect texts from web-based conventional media (news items in mainstream media sites) and social media (blogs and Twitter) and to process those texts, recognizing topics and political actors, analyzing relevant linguistic units, and generating indicators of both frequency of mention and polarity (positivity/negativity) of mentions to political actors across sources, types of sources, and across time. △ Less

Submitted 29 November, 2015; originally announced November 2015.

Comments: 2015 IEEE International Conference on Computer and Information Technology, Ubiquitous Computing and Communications

arXiv:1508.02096 [pdf, other]

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Authors: Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso

Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly,… ▽ More We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish). △ Less

Submitted 23 May, 2016; v1 submitted 9 August, 2015; originally announced August 2015.

Showing 1–21 of 21 results for author: Amir, S