Search | arXiv e-print repository

doi 10.1145/3660354.3660355

Honeyfile Camouflage: Hiding Fake Files in Plain Sight

Authors: Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil S. Kanhere

Abstract: Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one… ▽ More Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one on clustering with mixture fitting. We evaluate and compare the metrics, showing that both perform well on a publicly available GitHub software repository dataset. △ Less

Submitted 10 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 3rd Workshop on the security implications of Deepfakes and Cheapfakes (WDC) co-located at ACM ASIACCS 2024

arXiv:2303.10871 [pdf, other]

NASA Science Mission Directorate Knowledge Graph Discovery

Authors: Roelien C. Timmer, Fech Scen Khoo, Megan Mark, Marcella Scoczynski Ribeiro Martins, Anamaria Berea, Gregory Renard, Kaylin Bugbee

Abstract: The size of the National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) is growing exponentially, allowing researchers to make discoveries. However, making discoveries is challenging and time-consuming due to the size of the data catalogs, and as many concepts and data are indirectly connected. This paper proposes a pipeline to generate knowledge graphs (KGs) represe… ▽ More The size of the National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) is growing exponentially, allowing researchers to make discoveries. However, making discoveries is challenging and time-consuming due to the size of the data catalogs, and as many concepts and data are indirectly connected. This paper proposes a pipeline to generate knowledge graphs (KGs) representing different NASA SMD domains. These KGs can be used as the basis for dataset search engines, saving researchers time and supporting them in finding new connections. We collected textual data and used several modern natural language processing (NLP) methods to create the nodes and the edges of the KGs. We explore the cross-domain connections, discuss our challenges, and provide future directions to inspire researchers working on similar challenges. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2208.07127 [pdf, other]

doi 10.1109/TPSISA52974.2021.00020

Deception for Cyber Defence: Challenges and Opportunities

Authors: David Liebowitz, Surya Nepal, Kristen Moore, Cody J. Christopher, Salil S. Kanhere, David Nguyen, Roelien C. Timmer, Michael Longland, Keerth Rathakumar

Abstract: Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for scalable, automated generation of realistic decept… ▽ More Deception is rapidly growing as an important tool for cyber defence, complementing existing perimeter security measures to rapidly detect breaches and data theft. One of the factors limiting the use of deception has been the cost of generating realistic artefacts by hand. Recent advances in Machine Learning have, however, created opportunities for scalable, automated generation of realistic deceptions. This vision paper describes the opportunities and challenges involved in develo** models to mimic many common elements of the IT stack for deception effects. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Journal ref: 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), 2021, pp. 173-182

arXiv:2203.07580 [pdf, other]

TSM: Measuring the Enticement of Honeyfiles with Natural Language Processing

Authors: Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil Kanhere

Abstract: Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (TSM), which uses topic modelling to represent files… ▽ More Honeyfile deployment is a useful breach detection method in cyber deception that can also inform defenders about the intent and interests of intruders and malicious insiders. A key property of a honeyfile, enticement, is the extent to which the file can attract an intruder to interact with it. We introduce a novel metric, Topic Semantic Matching (TSM), which uses topic modelling to represent files in the repository and semantic matching in an embedding vector space to compare honeyfile text and topic words robustly. We also present a honeyfile corpus created with different Natural Language Processing (NLP) methods. Experiments show that TSM is effective in inter-corpus comparisons and is a promising tool to measure the enticement of honeyfiles. TSM is the first measure to use NLP techniques to quantify the enticement of honeyfile content that compares the essential topical content of local contexts to honeyfiles and is robust to paraphrasing. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.06793 [pdf, other]

Can pre-trained Transformers be used in detecting complex sensitive sentences? -- A Monsanto case study

Authors: Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil S. Kanhere

Abstract: Each and every organisation releases information in a variety of forms ranging from annual reports to legal proceedings. Such documents may contain sensitive information and releasing them openly may lead to the leakage of confidential information. Detection of sentences that contain sensitive information in documents can help organisations prevent the leakage of valuable confidential information.… ▽ More Each and every organisation releases information in a variety of forms ranging from annual reports to legal proceedings. Such documents may contain sensitive information and releasing them openly may lead to the leakage of confidential information. Detection of sentences that contain sensitive information in documents can help organisations prevent the leakage of valuable confidential information. This is especially challenging when such sentences contain a substantial amount of information or are paraphrased versions of known sensitive content. Current approaches to sensitive information detection in such complex settings are based on keyword-based approaches or standard machine learning models. In this paper, we wish to explore whether pre-trained transformer models are well suited to detect complex sensitive information. Pre-trained transformers are typically trained on an enormous amount of text and therefore readily learn grammar, structure and other linguistic features, making them particularly attractive for this task. Through our experiments on the Monsanto trial data set, we observe that the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) transformer model performs better than traditional models. We experimented with four different categories of documents in the Monsanto dataset and observed that BERT achieves better F2 scores by 24.13\% to 65.79\% for GHOST, 30.14\% to 54.88\% for TOXIC, 39.22\% for CHEMI, 53.57\% for REGUL compared to existing sensitive information detection models. △ Less

Submitted 13 March, 2022; originally announced March 2022.

arXiv:2202.03085 [pdf, other]

Streaming readout for next generation electron scattering experiment

Authors: Fabrizio Ameli, Marco Battaglieri, Vladimir V. Berdnikov, Mariangela Bondì, Sergey Boyarinov, Nathan Brei, Laura Cappelli, Andrea Celentano, Tommaso Chiarusi, Raffaella De Vita, Cristiano Fanelli, Vardan Gyurjyan, David Lawrence, Patrick Moran, Paolo Musico, Carmelo Pellegrino, Alessandro Pilloni, Ben Raydo, Carl Timmer, Maurizio Ungaro, Simone Vallarino

Abstract: Current and future experiments at the high intensity frontier are expected to produce an enormous amount of data that needs to be collected and stored for offline analysis. Thanks to the continuous progress in computing and networking technology, it is now possible to replace the standard `triggered' data acquisition systems with a new, simplified and outperforming scheme. `Streaming readout' (SRO… ▽ More Current and future experiments at the high intensity frontier are expected to produce an enormous amount of data that needs to be collected and stored for offline analysis. Thanks to the continuous progress in computing and networking technology, it is now possible to replace the standard `triggered' data acquisition systems with a new, simplified and outperforming scheme. `Streaming readout' (SRO) DAQ aims to replace the hardware-based trigger with a much more powerful and flexible software-based one, that considers the whole detector information for efficient real-time data tagging and selection. Considering the crucial role of DAQ in an experiment, validation with on-field tests is required to demonstrate SRO performance. In this paper we report results of the on-beam validation of the Jefferson Lab SRO framework. We exposed different detectors (PbWO-based electromagnetic calorimeters and a plastic scintillator hodoscope) to the Hall-D electron-positron secondary beam and to the Hall-B production electron beam, with increasingly complex experimental conditions. By comparing the data collected with the SRO system against the traditional DAQ, we demonstrate that the SRO performs as expected. Furthermore, we provide evidence of its superiority in implementing sophisticated AI-supported algorithms for real-time data analysis and reconstruction. △ Less

Submitted 7 February, 2022; originally announced February 2022.

arXiv:2011.01345 [pdf]

SAMPA Based Streaming Readout Data Acquisition Prototype

Authors: E. Jastrzembski, D. Abbott, J. Gu, V. Gyurjyan, G. Heyes, B. Moffit, E. Pooser, C. Timmer, A. Hellman

Abstract: We have assembled a small-scale streaming data acquisition system based on the SAMPA front-end ASIC. We report on measurements performed on the SAMPA chip and preliminary cosmic ray data acquired from a Gas Electron Multiplier (GEM) detector read out using the SAMPA. We have assembled a small-scale streaming data acquisition system based on the SAMPA front-end ASIC. We report on measurements performed on the SAMPA chip and preliminary cosmic ray data acquired from a Gas Electron Multiplier (GEM) detector read out using the SAMPA. △ Less

Submitted 2 November, 2020; originally announced November 2020.

Comments: Submitted to the 22nd Virtual IEEE Real Time Conference, 12-23 October 2020

arXiv:hep-ex/0305016 [pdf, ps]

FIPA agent based network distributed control system

Authors: V. Gyurjyan, D. Abbott, G. Heyes, E. Jastrzembski, C. Timmer, E. Wolin

Abstract: A control system with the capabilities to combine heteregeneous control systems or processes into a uniform homogeneous environment is discussed. This dynamically extensible system is an example of the software system at the agent level of abstraction. This level of abstraction considers agents as atomic entities that communicate to implement the functionality of the control system. Agents engin… ▽ More A control system with the capabilities to combine heteregeneous control systems or processes into a uniform homogeneous environment is discussed. This dynamically extensible system is an example of the software system at the agent level of abstraction. This level of abstraction considers agents as atomic entities that communicate to implement the functionality of the control system. Agents engineering aspects are addressed by adopting the domain independent software standard, formulated by FIPA. Jade core Java classes are used as a FIPA specification implementation. A special, lightweight, XML RDFS based, control oriented, ontology markup language is developed to standardize the description of the arbitrary control system data processor. Control processes, described in this language, are integrated into the global system at runtime, without actual programming. Fault tolerance and recovery issues are also addressed. △ Less

Submitted 12 May, 2003; originally announced May 2003.

Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages, PS format. PSN THGT009

Journal ref: ECONF C0303241:THGT009,2003

Showing 1–8 of 8 results for author: Timmer, C