Search | arXiv e-print repository

Prime+Retouch: When Cache is Locked and Leaked

Authors: Jaehyuk Lee, Fan Sang, Taesoo Kim

Abstract: Caches on the modern commodity CPUs have become one of the major sources of side-channel leakages and been abused as a new attack vector. To thwart the cache-based side-channel attacks, two types of countermeasures have been proposed: detection-based ones that limit the amount of microarchitectural traces an attacker can leave, and cache prefetching-and-locking techniques that claim to prevent suc… ▽ More Caches on the modern commodity CPUs have become one of the major sources of side-channel leakages and been abused as a new attack vector. To thwart the cache-based side-channel attacks, two types of countermeasures have been proposed: detection-based ones that limit the amount of microarchitectural traces an attacker can leave, and cache prefetching-and-locking techniques that claim to prevent such leakage by disallowing evictions on sensitive data. In this paper, we present the Prime+Retouch attack that completely bypasses these defense schemes by accurately inferring the cache activities with the metadata of the cache replacement policy. Prime+Retouch has three noticeable properties: 1) it incurs no eviction on the victim's data, allowing us to bypass the two known mitigation schemes, 2) it requires minimal synchronization of only one memory access to the attacker's pre-primed cache lines, and 3) it leaks data via non-shared memory, yet because underlying eviction metadata is shared. We demonstrate Prime+Retouch in two architectures: predominant Intel x86 and emerging Apple M1. We elucidate how Prime+Retouch can break the T-table implementation of AES with robust cache side-channel mitigations such as Cloak, under both normal and SGX-protected environments. We also manifest feasibility of the Prime+Retouch attack on the M1 platform imposing more restrictions where the precise measurement tools such as core clock cycle timer and performance counters are inaccessible to the attacker. Furthermore, we first demystify undisclosed cache architecture and its eviction policy of L1 data cache on Apple M1 architecture. We also devise a user-space noise-free cache monitoring tool by repurposing Intel TSX. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2312.13119 [pdf, other]

Graphene: Infrastructure Security Posture Analysis with AI-generated Attack Graphs

Authors: Xin **, Charalampos Katsis, Fan Sang, Jiahao Sun, Elisa Bertino, Ramana Rao Kompella, Ashish Kundu

Abstract: The rampant occurrence of cybersecurity breaches imposes substantial limitations on the progress of network infrastructures, leading to compromised data, financial losses, potential harm to individuals, and disruptions in essential services. The current security landscape demands the urgent development of a holistic security assessment solution that encompasses vulnerability analysis and investiga… ▽ More The rampant occurrence of cybersecurity breaches imposes substantial limitations on the progress of network infrastructures, leading to compromised data, financial losses, potential harm to individuals, and disruptions in essential services. The current security landscape demands the urgent development of a holistic security assessment solution that encompasses vulnerability analysis and investigates the potential exploitation of these vulnerabilities as attack paths. In this paper, we propose Graphene, an advanced system designed to provide a detailed analysis of the security posture of computing infrastructures. Using user-provided information, such as device details and software versions, Graphene performs a comprehensive security assessment. This assessment includes identifying associated vulnerabilities and constructing potential attack graphs that adversaries can exploit. Furthermore, Graphene evaluates the exploitability of these attack paths and quantifies the overall security posture through a scoring mechanism. The system takes a holistic approach by analyzing security layers encompassing hardware, system, network, and cryptography. Furthermore, Graphene delves into the interconnections between these layers, exploring how vulnerabilities in one layer can be leveraged to exploit vulnerabilities in others. In this paper, we present the end-to-end pipeline implemented in Graphene, showcasing the systematic approach adopted for conducting this thorough security analysis. △ Less

Submitted 30 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2206.07164 [pdf, other]

Edge Security: Challenges and Issues

Authors: Xin **, Charalampos Katsis, Fan Sang, Jiahao Sun, Ashish Kundu, Ramana Kompella

Abstract: Edge computing is a paradigm that shifts data processing services to the network edge, where data are generated. While such an architecture provides faster processing and response, among other benefits, it also raises critical security issues and challenges that must be addressed. This paper discusses the security threats and vulnerabilities emerging from the edge network architecture spanning fro… ▽ More Edge computing is a paradigm that shifts data processing services to the network edge, where data are generated. While such an architecture provides faster processing and response, among other benefits, it also raises critical security issues and challenges that must be addressed. This paper discusses the security threats and vulnerabilities emerging from the edge network architecture spanning from the hardware layer to the system layer. We further discuss privacy and regulatory compliance challenges in such networks. Finally, we argue the need for a holistic approach to analyze edge network security posture, which must consider knowledge from each layer. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: 21 pages. Survey paper

arXiv:1909.11164 [pdf, other]

P2FAAS: Toward Privacy-Preserving Fuzzing as a Service

Authors: Fan Sang, Daehee Jang, Ming-Wei Shih, Taesoo Kim

Abstract: Global corporations (e.g., Google and Microsoft) have recently introduced a new model of cloud services, fuzzing-as-a-service (FaaS). Despite effectively alleviating the cost of fuzzing, the model comes with privacy concerns. For example, the end user has to trust both cloud and service providers who have access to the application to be fuzzed. Such concerns are due to the platform is under the co… ▽ More Global corporations (e.g., Google and Microsoft) have recently introduced a new model of cloud services, fuzzing-as-a-service (FaaS). Despite effectively alleviating the cost of fuzzing, the model comes with privacy concerns. For example, the end user has to trust both cloud and service providers who have access to the application to be fuzzed. Such concerns are due to the platform is under the control of its provider and the application and the fuzzer are highly coupled. In this paper, we propose P2FaaS, a new ecosystem that preserves end user's privacy while providing FaaS in the cloud. The key idea of P2FaaS is to utilize Intel SGX for preventing cloud and service providers from learning information about the application. Our preliminary evaluation shows that P2FaaS imposes 45% runtime overhead to the fuzzing compared to the baseline. In addition, P2FaaS demonstrates that, with recently introduced hardware, Intel SGX Card, the fuzzing service can be scaled up to multiple servers without native SGX support. △ Less

Submitted 24 September, 2019; originally announced September 2019.

arXiv:cs/0306050 [pdf, ps, other]

Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition

Authors: Erik F. Tjong Kim Sang, Fien De Meulder

Abstract: We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. △ Less

Submitted 12 June, 2003; originally announced June 2003.

ACM Class: I.2.7

Journal ref: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 142-147

arXiv:cs/0209010 [pdf, ps, other]

Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition

Authors: Erik F. Tjong Kim Sang

Abstract: We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. We describe the CoNLL-2002 shared task: language-independent named entity recognition. We give background information on the data sets and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance. △ Less

Submitted 5 September, 2002; originally announced September 2002.

Comments: 4 pages

ACM Class: I.2.7

Journal ref: Dan Roth and Antal van den Bosch (eds.), Proceedings of CoNLL-2002, Taipei, Taiwan, 2002, pp. 155-158

arXiv:cs/0204049 [pdf, ps, other]

Memory-Based Shallow Parsing

Authors: Erik F. Tjong Kim Sang

Abstract: We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and th… ▽ More We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement. △ Less

Submitted 24 April, 2002; originally announced April 2002.

Report number: jmlr-2002-tks ACM Class: I.2.7

Journal ref: Journal of Machine Learning Research, volume 2 (March), 2002, pp. 559-594

arXiv:cs/0107018 [pdf, ps, other]

Combining a self-organising map with memory-based learning

Authors: James Hammerton, Erik F. Tjong Kim Sang

Abstract: Memory-based learning (MBL) has enjoyed considerable success in corpus-based natural language processing (NLP) tasks and is thus a reliable method of getting a high-level of performance when building corpus-based NLP systems. However there is a bottleneck in MBL whereby any novel testing item has to be compared against all the training items in memory base. For this reason there has been some in… ▽ More Memory-based learning (MBL) has enjoyed considerable success in corpus-based natural language processing (NLP) tasks and is thus a reliable method of getting a high-level of performance when building corpus-based NLP systems. However there is a bottleneck in MBL whereby any novel testing item has to be compared against all the training items in memory base. For this reason there has been some interest in various forms of memory editing whereby some method of selecting a subset of the memory base is employed to reduce the number of comparisons. This paper investigates the use of a modified self-organising map (SOM) to select a subset of the memory items for comparison. This method involves reducing the number of comparisons to a value proportional to the square root of the number of training items. The method is tested on the identification of base noun-phrases in the Wall Street Journal corpus, using sections 15 to 18 for training and section 20 for testing. △ Less

Submitted 15 July, 2001; originally announced July 2001.

ACM Class: I.2.7

Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 9-14

arXiv:cs/0107017 [pdf, ps, other]

Learning Computational Grammars

Authors: John Nerbonne, Anja Belz, Nicola Cancedda, Herve Dejean, James Hammerton, Rob Koeling, Stasinos Konstantopoulos, Miles Osborne, Franck Thollard, Erik F. Tjong Kim Sang

Abstract: This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the da… ▽ More This paper reports on the "Learning Computational Grammars" (LCG) project, a postdoc network devoted to studying the application of machine learning techniques to grammars suitable for computational use. We were interested in a more systematic survey to understand the relevance of many factors to the success of learning, esp. the availability of annotated data, the kind of dependencies in the data, and the availability of knowledge bases (grammars). We focused on syntax, esp. noun phrase (NP) syntax. △ Less

Submitted 15 July, 2001; originally announced July 2001.

ACM Class: I.2.7

Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 97-104

arXiv:cs/0107016 [pdf, ps, other]

Introduction to the CoNLL-2001 Shared Task: Clause Identification

Authors: Erik F. Tjong Kim Sang, Herve Dejean

Abstract: We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. We describe the CoNLL-2001 shared task: dividing text into clauses. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. △ Less

Submitted 15 July, 2001; originally announced July 2001.

ACM Class: I.2.7

Journal ref: In: Walter Daelemans and Remi Zajac (eds.), Proceedings of CoNLL-2001, Toulouse, France, 2001, pp. 53-57

arXiv:cs/0009008 [pdf, ps, other]

Introduction to the CoNLL-2000 Shared Task: Chunking

Authors: Erik F. Tjong Kim Sang, Sabine Buchholz

Abstract: We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. We describe the CoNLL-2000 shared task: dividing text into syntactically related non-overlap** groups of words, so-called text chunking. We give background information on the data sets, present a general overview of the systems that have taken part in the shared task and briefly discuss their performance. △ Less

Submitted 18 September, 2000; originally announced September 2000.

Comments: 6 pages

ACM Class: I.2.7

Journal ref: Proceedings of CoNLL-2000 and LLL-2000, Lisbon, Portugal

arXiv:cs/0008012 [pdf, ps, other]

Applying System Combination to Base Noun Phrase Identification

Authors: Erik F. Tjong Kim Sang, Walter Daelemans, Herve Dejean, Rob Koeling, Yuval Krymolowski, Vasin Punyakanok, Dan Roth

Abstract: We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this… ▽ More We use seven machine learning algorithms for one task: identifying base noun phrases. The results have been processed by different system combination methods and all of these outperformed the best individual result. We have applied the seven learners with the best combinator, a majority vote of the top five systems, to a standard data set and managed to improve the best published result for this data set. △ Less

Submitted 17 August, 2000; originally announced August 2000.

Comments: 7 pages

ACM Class: I.2.7

Journal ref: Proceedings of COLING 2000, Saarbruecken, Germany

arXiv:cs/0005015 [pdf, ps, other]

Noun Phrase Recognition by System Combination

Authors: Erik F. Tjong Kim Sang

Abstract: The performance of machine learning algorithms can be improved by combining the output of different systems. In this paper we apply this idea to the recognition of noun phrases.We generate different classifiers by using different representations of the data. By combining the results with voting techniques described in (Van Halteren et.al. 1998) we manage to improve the best reported performances… ▽ More The performance of machine learning algorithms can be improved by combining the output of different systems. In this paper we apply this idea to the recognition of noun phrases.We generate different classifiers by using different representations of the data. By combining the results with voting techniques described in (Van Halteren et.al. 1998) we manage to improve the best reported performances on standard data sets for base noun phrases and arbitrary noun phrases. △ Less

Submitted 10 May, 2000; originally announced May 2000.

Comments: 6 pages

ACM Class: I.2.7

Journal ref: Proceedings of NAACL 2000, Seattle, WA, USA

arXiv:cs/9907006 [pdf, ps, other]

Representing Text Chunks

Authors: Erik F. Tjong Kim Sang, Jorn Veenstra

Abstract: Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the… ▽ More Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (Ramshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the the data representation choice has a minor influence on chunking performance. However, equipped with the most suitable data representation, our memory-based learning chunker was able to improve the best published chunking results for a standard data set. △ Less

Submitted 6 July, 1999; originally announced July 1999.

Comments: 7 pages

ACM Class: I.2.7

Journal ref: EACL'99, Bergen

Showing 1–14 of 14 results for author: Sang, F