Search | arXiv e-print repository

doi 10.1145/3503222.3507779

Adelie: Continuous Address Space Layout Re-randomization for Linux Drivers

Authors: Ruslan Nikolaev, Hassan Nadeem, Cathlyn Stone, Binoy Ravindran

Abstract: While address space layout randomization (ASLR) has been extensively studied for user-space programs, the corresponding OS kernel's KASLR support remains very limited, making the kernel vulnerable to just-in-time (JIT) return-oriented programming (ROP) attacks. Furthermore, commodity OSs such as Linux restrict their KASLR range to 32 bits due to architectural constraints (e.g., x86-64 only support… ▽ More While address space layout randomization (ASLR) has been extensively studied for user-space programs, the corresponding OS kernel's KASLR support remains very limited, making the kernel vulnerable to just-in-time (JIT) return-oriented programming (ROP) attacks. Furthermore, commodity OSs such as Linux restrict their KASLR range to 32 bits due to architectural constraints (e.g., x86-64 only supports 32-bit immediate operands for most instructions), which makes them vulnerable to even unsophisticated brute-force ROP attacks due to low entropy. Most in-kernel pointers remain static, exacerbating the problem when pointers are leaked. Adelie, our kernel defense mechanism, overcomes KASLR limitations, increases KASLR entropy, and makes successful ROP attacks on the Linux kernel much harder to achieve. First, Adelie enables the position-independent code (PIC) model so that the kernel and its modules can be placed anywhere in the 64-bit virtual address space, at any distance apart from each other. Second, Adelie implements stack re-randomization and address encryption on modules. Finally, Adelie enables efficient continuous KASLR for modules by using the PIC model to make it (almost) impossible to inject ROP gadgets through these modules regardless of gadget's origin. Since device drivers (typically compiled as modules) are often developed by third parties and are typically less tested than core OS parts, they are also often more vulnerable. By fully re-randomizing device drivers, the last two contributions together prevent most JIT ROP attacks since vulnerable modules are very likely to be a starting point of an attack. Furthermore, some OS instances in virtualized environments are specifically designated to run device drivers, where drivers are the primary target of JIT ROP attacks. Our evaluation shows high efficiency of Adelie's approach. [full abstract is in the paper] △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22), February 28 - March 4, 2022, Lausanne, Switzerland

arXiv:2201.02179 [pdf, other]

doi 10.1145/3490148.3538572

wCQ: A Fast Wait-Free Queue with Bounded Memory Usage

Authors: Ruslan Nikolaev, Binoy Ravindran

Abstract: The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple-consumer (MPMC) queues. However, only a fraction of them have high performance. In addition, many queue designs, such as LCRQ, trade memory usage for better performance. The recently proposed SCQ design achieves both memory efficiency as well as excellent performance. Unfortun… ▽ More The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple-consumer (MPMC) queues. However, only a fraction of them have high performance. In addition, many queue designs, such as LCRQ, trade memory usage for better performance. The recently proposed SCQ design achieves both memory efficiency as well as excellent performance. Unfortunately, both LCRQ and SCQ are only lock-free. On the other hand, existing wait-free queues are either not very performant or suffer from potentially unbounded memory usage. Strictly described, the latter queues, such as Yang & Mellor-Crummey's (YMC) queue, forfeit wait-freedom as they are blocking when memory is exhausted. We present a wait-free queue, called wCQ. wCQ is based on SCQ and uses its own variation of fast-path-slow-path methodology to attain wait-freedom and bound memory usage. Our experimental studies on x86 and PowerPC architectures validate wCQ's great performance and memory efficiency. They also show that wCQ's performance is often on par with the best known concurrent queue designs. △ Less

Submitted 14 July, 2022; v1 submitted 6 January, 2022; originally announced January 2022.

Journal ref: Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2022)

arXiv:2108.02763 [pdf, other]

Crystalline: Fast and Memory Efficient Wait-Free Reclamation

Authors: Ruslan Nikolaev, Binoy Ravindran

Abstract: Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking… ▽ More Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and suffers from suboptimal performance in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom. We present a new wait-free memory reclamation scheme, Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline guarantees complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystalline's high performance and high memory efficiency. Simultaneously ensuring all three properties require overcoming unique challenges which we discuss in the paper. Crystalline's implementation relies on specialized instructions which are widely available on commodity hardware such as x86-64 or ARM64. Our experimental evaluations show that Crystalline exhibits outstanding scalability and memory efficiency, and achieves superior throughput than typical reclamation schemes such as EBR as the number of threads grows. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2002.08928 [pdf, other]

doi 10.1145/3381052.3381316

LibrettOS: A Dynamically Adaptable Multiserver-Library OS

Authors: Ruslan Nikolaev, Mincheol Sung, Binoy Ravindran

Abstract: We present LibrettOS, an OS design that fuses two paradigms to simultaneously address issues of isolation, performance, compatibility, failure recoverability, and run-time upgrades. LibrettOS acts as a microkernel OS that runs servers in an isolated manner. LibrettOS can also act as a library OS when, for better performance, selected applications are granted exclusive access to virtual hardware re… ▽ More We present LibrettOS, an OS design that fuses two paradigms to simultaneously address issues of isolation, performance, compatibility, failure recoverability, and run-time upgrades. LibrettOS acts as a microkernel OS that runs servers in an isolated manner. LibrettOS can also act as a library OS when, for better performance, selected applications are granted exclusive access to virtual hardware resources such as storage and networking. Furthermore, applications can switch between the two OS modes with no interruption at run-time. LibrettOS has a uniquely distinguishing advantage in that, the two paradigms seamlessly coexist in the same OS, enabling users to simultaneously exploit their respective strengths (i.e., greater isolation, high performance). Systems code, such as device drivers, network stacks, and file systems remain identical in the two modes, enabling dynamic mode switching and reducing development and maintenance costs. To illustrate these design principles, we implemented a prototype of LibrettOS using rump kernels, allowing us to reuse existent, hardened NetBSD device drivers and a large ecosystem of POSIX/BSD-compatible applications. We use hardware (VM) virtualization to strongly isolate different rump kernel instances from each other. Because the original rumprun unikernel targeted a much simpler model for uniprocessor systems, we redesigned it to support multicore systems. Unlike kernel-bypass libraries such as DPDK, applications need not be modified to benefit from direct hardware access. LibrettOS also supports indirect access through a network server that we have developed. Applications remain uninterrupted even when network components fail or need to be upgraded. Finally, to efficiently use hardware resources, applications can dynamically switch between the indirect and direct modes based on their I/O load at run-time. [full abstract is in the paper] △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '20), March 17, 2020, Lausanne, Switzerland

arXiv:2001.01999 [pdf, other]

doi 10.1145/3332466.3374540

Universal Wait-Free Memory Reclamation

Authors: Ruslan Nikolaev, Binoy Ravindran

Abstract: In this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide similar guarantees for certain data structures, they lack support for arbitrary wait-free data structures. Consequently, developers are typically forc… ▽ More In this paper, we present a universal memory reclamation scheme, Wait-Free Eras (WFE), for deleted memory blocks in wait-free concurrent data structures. WFE's key innovation is that it is completely wait-free. Although some prior techniques provide similar guarantees for certain data structures, they lack support for arbitrary wait-free data structures. Consequently, developers are typically forced to marry their wait-free data structures with lock-free Hazard Pointers or (potentially blocking) epoch-based memory reclamation. Since both these schemes provide weaker progress guarantees, they essentially forfeit the strong progress guarantee of wait-free data structures. Though making the original Hazard Pointers scheme or epoch-based reclamation completely wait-free seems infeasible, we achieved this goal with a more recent, (lock-free) Hazard Eras scheme, which we extend to guarantee wait-freedom. As this extension is non-trivial, we discuss all challenges pertaining to the construction of universal wait-free memory reclamation. WFE is implementable on ubiquitous x86_64 and AArch64 (ARM) architectures. Its API is mostly compatible with Hazard Pointers, which allows easy transitioning of existing data structures into WFE. Our experimental evaluations show that WFE's performance is close to epoch-based reclamation and almost matches the original Hazard Eras scheme, while providing the stronger wait-free progress guarantee. △ Less

Submitted 11 January, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

Journal ref: 25th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2020)

arXiv:1908.04511 [pdf, other]

doi 10.4230/LIPIcs.DISC.2019.28

A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

Authors: Ruslan Nikolaev

Abstract: We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object… ▽ More We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable. We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms. △ Less

Submitted 13 August, 2019; originally announced August 2019.

Journal ref: 33rd International Symposium on Distributed Computing (DISC 2019)

arXiv:1905.07903 [pdf, other]

doi 10.1145/3453483.3454090

Snapshot-Free, Transparent, and Robust Memory Reclamation for Lock-Free Data Structures

Authors: Ruslan Nikolaev, Binoy Ravindran

Abstract: We present a family of safe memory reclamation schemes, Hyaline, which are fast, scalable, and transparent to the underlying lock-free data structures. Hyaline is based on reference counting - considered impractical for memory reclamation in the past due to high overheads. Hyaline uses reference counters only during reclamation, but not while accessing individual objects, which reduces overheads f… ▽ More We present a family of safe memory reclamation schemes, Hyaline, which are fast, scalable, and transparent to the underlying lock-free data structures. Hyaline is based on reference counting - considered impractical for memory reclamation in the past due to high overheads. Hyaline uses reference counters only during reclamation, but not while accessing individual objects, which reduces overheads for object accesses. Since with reference counters, an arbitrary thread ends up freeing memory, Hyaline's reclamation workload is (almost) balanced across all threads, unlike most prior reclamation schemes such as epoch-based reclamation (EBR) or hazard pointers (HP). Hyaline often yields (excellent) EBR-grade performance with (good) HP-grade memory efficiency, which is a challenging tradeoff with all existing schemes. Hyaline schemes offer: (i) high performance; (ii) good memory efficiency; (iii) robustness: bounding memory usage even in the presence of stalled threads, a well-known problem with EBR; (iv) transparency: supporting virtually unbounded number of threads (or concurrent entities) that can be created and deleted dynamically, and effortlessly join existent workload; (v) autonomy: avoiding special OS mechanisms and being non-intrusive to runtime or compiler environments; (vi) simplicity: enabling easy integration into unmanaged C/C++ code; and (vii) generality: supporting many data structures. All existing schemes lack one or more properties. We have implemented and tested Hyaline on x86(-64), ARM32/64, PowerPC, and MIPS. The general approach requires LL/SC or double-width CAS, while a specialized version also works with single-width CAS. Our evaluation reveals that Hyaline's throughput is very high - it steadily outperforms EBR by 10% in one test and yields 2x gains in oversubscribed scenarios. Hyaline's superior memory efficiency is especially evident in read-dominated workloads △ Less

Submitted 1 May, 2021; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: An extended version of the PLDI'21 paper (with Appendix)

Journal ref: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021)

arXiv:1408.1823 [pdf]

doi 10.1088/1748-0221/9/11/P11014

Experimental study of ionization yield of liquid xenon for electron recoils in the energy range 2.8 - 80 keV

Authors: D. Yu. Akimov, V. V. Afanasyev, I. S. Alexandrov, V. A. Belov, A. I. Bolozdynya, A. A. Burenkov, Yu. V. Efremenko, D. A. Egorov, A. V. Etenko, M. A. Gulin, S. V. Ivakhin, V. A. Kaplin, A. K. Karelin, A. V. Khromov, M. A. Kirsanov, S. G. Klimanov, A. S. Kobyakin, A. M. Konovalov, A. G. Kovalenko, A. V. Kuchenkov, A. V. Kumpan, Yu. A. Melikyan, R. I. Nikolaev, D. G. Rudik, V. V. Sosnovtsev , et al. (1 additional authors not shown)

Abstract: We present the results of the first experimental study of ionization yield of electron recoils with energies below 100 keV produced in liquid xenon by the isotopes: 37Ar, 83mKr, 241Am, 129Xe, 131Xe. It is confirmed by a direct measurement with 37Ar isotope (2.82 keV) that the ionization yield is growing up with the energy decrease in the energy range below ~ 10 keV accordingly to the NEST predicti… ▽ More We present the results of the first experimental study of ionization yield of electron recoils with energies below 100 keV produced in liquid xenon by the isotopes: 37Ar, 83mKr, 241Am, 129Xe, 131Xe. It is confirmed by a direct measurement with 37Ar isotope (2.82 keV) that the ionization yield is growing up with the energy decrease in the energy range below ~ 10 keV accordingly to the NEST predictions. Decay time of scintillation at 2.82 keV is measured to be 25 +/- 3 ns at the electric field of 3.75 kV/cm. △ Less

Submitted 8 August, 2014; originally announced August 2014.

Comments: 16 pages, 8 figures

arXiv:cond-mat/0011528 [pdf]

doi 10.1103/PhysRevLett.86.5779

Exchange Field Induced Magnetoresistance in Colossal Magnetoresistance Manganites

Authors: I. N. Krivorotov, K. R. Nikolaev, A. Yu. Dobin, A. M. Goldman, E. D. Dahlberg

Abstract: The effect of an exchange field on electrical transport in thin films of metallic ferromagnetic manganites has been investigated. The exchange field was induced both by direct exchange coupling in a ferromagnet/antiferromagnet multilayer and by indirect exchange interaction in a ferromagnet/paramagnet superlattice. The electrical resistance of the manganite layers was found to be determined by t… ▽ More The effect of an exchange field on electrical transport in thin films of metallic ferromagnetic manganites has been investigated. The exchange field was induced both by direct exchange coupling in a ferromagnet/antiferromagnet multilayer and by indirect exchange interaction in a ferromagnet/paramagnet superlattice. The electrical resistance of the manganite layers was found to be determined by the absolute value of the vector sum of the effective exchange field and the external magnetic field. △ Less

Submitted 30 November, 2000; originally announced November 2000.

Comments: 5 pages, 4 figures

arXiv:cond-mat/0004230 [pdf, ps, other]

doi 10.1103/PhysRevLett.85.3728

Oscillatory Exchange Coupling and Positive Magnetoresistance in Epitaxial Oxide Heterostructures

Authors: K. R. Nikolaev, A. Yu. Dobin, I. N. Krivorotov, W. K. Cooley, A. Bhattacharya, A. L. Kobrinskii, L. I. Glazman, R. M. Wentzcovitch, E. Dan Dahlberg, A. M. Goldman

Abstract: Oscillations in the exchange coupling between ferromagnetic $La_{2/3}Ba_{1/3}MnO_3$ layers with paramagnetic $LaNiO_3$ spacer layer thickness has been observed in epitaxial heterostructures of the two oxides. This behavior is explained within the RKKY model employing an {\it ab initio} calculated band structure of $LaNiO_3$, taking into account strong electron scattering in the spacer. Antiferro… ▽ More Oscillations in the exchange coupling between ferromagnetic $La_{2/3}Ba_{1/3}MnO_3$ layers with paramagnetic $LaNiO_3$ spacer layer thickness has been observed in epitaxial heterostructures of the two oxides. This behavior is explained within the RKKY model employing an {\it ab initio} calculated band structure of $LaNiO_3$, taking into account strong electron scattering in the spacer. Antiferromagnetically coupled superlattices exhibit a positive current-in-plane magnetoresistance. △ Less

Submitted 13 April, 2000; originally announced April 2000.

Comments: 4 pages (RevTeX), 5 figures (EPS)

Showing 1–10 of 10 results for author: Nikolaev, R