Skip to main content

Showing 1–8 of 8 results for author: Tavabi, N

.
  1. arXiv:2307.07160  [pdf, other

    cs.CL cs.LG

    Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords

    Authors: Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour

    Abstract: We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: final version: accepted at ACL'23 RepL4NLP. arXiv admin note: text overlap with arXiv:2208.12367

  2. arXiv:2208.12367  [pdf, other

    cs.CL cs.LG

    A Compact Pretraining Approach for Neural Language Models

    Authors: Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour

    Abstract: Domain adaptation for large neural language models (NLMs) is coupled with massive amounts of unstructured data in the pretraining phase. In this study, however, we show that pretrained NLMs learn in-domain information more effectively and faster from a compact subset of the data that focuses on the key information in the domain. We construct these compact subsets from the unstructured data using a… ▽ More

    Submitted 28 August, 2022; v1 submitted 25 August, 2022; originally announced August 2022.

    Comments: First Version

  3. arXiv:2106.00614  [pdf, other

    eess.SP cs.LG

    Pattern Discovery in Time Series with Byte Pair Encoding

    Authors: Nazgol Tavabi, Kristina Lerman

    Abstract: The growing popularity of wearable sensors has generated large quantities of temporal physiological and activity data. Ability to analyze this data offers new opportunities for real-time health monitoring and forecasting. However, temporal physiological data presents many analytic challenges: the data is noisy, contains many missing values, and each series has a different length. Most methods prop… ▽ More

    Submitted 29 May, 2021; originally announced June 2021.

  4. Having a Bad Day? Detecting the Impact of Atypical Life Events Using Wearable Sensors

    Authors: Keith Burghardt, Nazgol Tavabi, Emilio Ferrara, Shrikanth Narayanan, Kristina Lerman

    Abstract: Life events can dramatically affect our psychological state and work performance. Stress, for example, has been linked to professional dissatisfaction, increased anxiety, and workplace burnout. We explore the impact of positive and negative life events on a number of psychological constructs through a multi-month longitudinal study of hospital and aerospace workers. Through causal inference, we de… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: 10 pages, 4 figures, and 3 tables

  5. arXiv:2004.04597  [pdf, other

    cs.CR cs.LG stat.ML

    Challenges in Forecasting Malicious Events from Incomplete Data

    Authors: Nazgol Tavabi, Andrés Abeliuk, Negar Mokhberian, Jeremy Abramson, Kristina Lerman

    Abstract: The ability to accurately predict cyber-attacks would enable organizations to mitigate their growing threat and avert the financial losses and disruptions they cause. But how predictable are cyber-attacks? Researchers have attempted to combine external data -- ranging from vulnerability disclosures to discussions on Twitter and the darkweb -- with machine learning algorithms to learn indicators of… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: Accepted in The Fifth Workshop on Computational Methods in Online Misbehavior, Companion Proceedings of The 2020 World Wide Web Conference (WWW '20)

  6. arXiv:1911.06959  [pdf, other

    cs.LG eess.SP stat.ML

    Learning Behavioral Representations from Wearable Sensors

    Authors: Nazgol Tavabi, Homa Hosseinmardi, Jennifer L. Villatte, Andrés Abeliuk, Shrikanth Narayanan, Emilio Ferrara, Kristina Lerman

    Abstract: Continuous collection of physiological data from wearable sensors enables temporal characterization of individual behaviors. Understanding the relation between an individual's behavioral patterns and psychological states can help identify strategies to improve quality of life. One challenge in analyzing physiological data is extracting the underlying behavioral states from the temporal sensor sign… ▽ More

    Submitted 4 July, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

  7. arXiv:1903.00156  [pdf, other

    cs.CY cs.CR cs.LG

    Characterizing Activity on the Deep and Dark Web

    Authors: Nazgol Tavabi, Nathan Bartley, Andrés Abeliuk, Sandeep Soni, Emilio Ferrara, Kristina Lerman

    Abstract: The deep and darkweb (d2web) refers to limited access web sites that require registration, authentication, or more complex encryption protocols to access them. These web sites serve as hubs for a variety of illicit activities: to trade drugs, stolen user credentials, hacking tools, and to coordinate attacks and manipulation campaigns. Despite its importance to cyber crime, the d2web has not been s… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

  8. arXiv:1806.03342  [pdf, other

    cs.SI cs.LG stat.ML

    Discovering Signals from Web Sources to Predict Cyber Attacks

    Authors: Palash Goyal, KSM Tozammel Hossain, Ashok Deb, Nazgol Tavabi, Nathan Bartley, Andr'es Abeliuk, Emilio Ferrara, Kristina Lerman

    Abstract: Cyber attacks are growing in frequency and severity. Over the past year alone we have witnessed massive data breaches that stole personal information of millions of people and wide-scale ransomware attacks that paralyzed critical infrastructure of several countries. Combating the rising cyber threat calls for a multi-pronged strategy, which includes predicting when these attacks will occur. The in… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.