-
LeanBin: Harnessing Lifting and Recompilation to Debloat Binaries
Authors:
Igor Wodiany,
Antoniu Pop,
Mikel Luján
Abstract:
To reduce the source of potential exploits, binary debloating or specialization tools are used to remove unnecessary code from binaries. This paper presents a new binary debloating and specialization tool, LeanBin, that harnesses lifting and recompilation, based on observed execution traces. The dynamically recorded execution traces capture the required subset of instructions and control flow of t…
▽ More
To reduce the source of potential exploits, binary debloating or specialization tools are used to remove unnecessary code from binaries. This paper presents a new binary debloating and specialization tool, LeanBin, that harnesses lifting and recompilation, based on observed execution traces. The dynamically recorded execution traces capture the required subset of instructions and control flow of the application binary for a given set of inputs. This initial control flow is subsequently augmented using heuristic-free static analysis to avoid overrestricting the input space; and the further structuring of the control flow and translation of binary instructions into a subset of C, enables a lightweight generation of the code that can be recompiled, obtaining LLVM IR and a new debloated binary. Unlike most debloating approaches, LeanBin enables both binary debloating of the application and shared libraries, while reusing the existing compiler infrastructure. Additionally, unlike existing binary lifters, it does not rely on potentially unsound heuristics, used by static lifters, nor suffers from long execution times, a limitation of existing dynamic lifters. Instead LeanBin combines both heuristic-free static and dynamic analysis. The run time during lifting and debloating SPEC CPU2006 INT benchmarks is on average 1.78$\times$, normalized to the native execution, and the debloated binary runs with an average overhead of 1.21$\times$. The percentage of gadgets, compared to the original binary, has a geomean between 24.10% and 30.22%, depending on the debloating strategy; the code size can be as low as 53.59%. For the SQLite use-case, LeanBin debloats a binary together with its shared library, and generates a debloated binary that runs up to 1.24$\times$ faster with 3.65% gadgets.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Active search and coverage using point-cloud reinforcement learning
Authors:
Matthias Rosynski,
Alexandru Pop,
Lucian Busoniu
Abstract:
We consider a problem in which the trajectory of a mobile 3D sensor must be optimized so that certain objects are both found in the overall scene and covered by the point cloud, as fast as possible. This problem is called target search and coverage, and the paper provides an end-to-end deep reinforcement learning (RL) solution to solve it. The deep neural network combines four components: deep hie…
▽ More
We consider a problem in which the trajectory of a mobile 3D sensor must be optimized so that certain objects are both found in the overall scene and covered by the point cloud, as fast as possible. This problem is called target search and coverage, and the paper provides an end-to-end deep reinforcement learning (RL) solution to solve it. The deep neural network combines four components: deep hierarchical feature learning occurs in the first stage, followed by multi-head transformers in the second, max-pooling and merging with bypassed information to preserve spatial relationships in the third, and a distributional dueling network in the last stage. To evaluate the method, a simulator is developed where cylinders must be found by a Kinect sensor. A network architecture study shows that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points and achieve not only a reduction of the network size but also better results. We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome. Finally, we compare RL using the best network with a greedy baseline that maximizes immediate rewards and requires for that purpose an oracle that predicts the next observation. We decided RL achieves significantly better and more robust results than the greedy strategy.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Does Bidirectional Traffic Do More Harm Than Good in LoRaWAN Based LPWA Networks?
Authors:
Alexandru-Ioan Pop,
Usman Raza,
Parag Kulkarni,
Mahesh Sooriyabandara
Abstract:
The need for low power, long range and low cost connectivity to meet the requirements of IoT applications has led to the emergence of Low Power Wide Area (LPWA) networking technologies. The promise of these technologies to wirelessly connect massive numbers of geographically dispersed devices at a low cost continues to attract a great deal of attention in the academic and commercial communities. S…
▽ More
The need for low power, long range and low cost connectivity to meet the requirements of IoT applications has led to the emergence of Low Power Wide Area (LPWA) networking technologies. The promise of these technologies to wirelessly connect massive numbers of geographically dispersed devices at a low cost continues to attract a great deal of attention in the academic and commercial communities. Several rollouts are already underway even though the performance of these technologies is yet to be fully understood. In light of these developments, tools to carry out `what-if analyses' and pre-deployment studies are needed to understand the implications of choices that are made at design time. While there are several promising technologies in the LPWA space, this paper specifically focuses on the LoRa/LoRaWAN technology. In particular, we present LoRaWANSim, a simulator which extends the LoRaSim tool to add support for the LoRaWAN MAC protocol, which employs bidirectional communication. This is a salient feature not available in any other LoRa simulator. Subsequently, we provide vital insights into the performance of LoRaWAN based networks through extensive simulations. In particular, we show that the achievable network capacity reported in earlier studies is quite optimistic. The introduction of downlink traffic can have a significant impact on the uplink throughput. The number of transmit attempts recommended in the LoRaWAN specification may not always be the best choice. We also highlight the energy consumption versus reliability trade-offs associated with the choice of number of retransmission attempts.
△ Less
Submitted 14 December, 2017; v1 submitted 13 April, 2017;
originally announced April 2017.
-
Project Beehive: A Hardware/Software Co-designed Stack for Runtime and Architectural Research
Authors:
Christos Kotselidis,
Andrey Rodchenko,
Colin Barrett,
Andy Nisbet,
John Mawer,
Will Toms,
James Clarkson,
Cosmin Gorgovan,
Amanieu d'Antras,
Yaman Cakmakci,
Thanos Stratikopoulos,
Sebastian Werner,
Jim Garside,
Javier Navaridas,
Antoniu Pop,
John Goodacre,
Mikel Lujan
Abstract:
The end of Dennard scaling combined with stagnation in architectural and compiler optimizations makes it challenging to achieve significant performance deltas. Solutions based solely in hardware or software are no longer sufficient to maintain the pace of improvements seen during the past few decades. In hardware, the end of single-core scaling resulted in the proliferation of multi-core system ar…
▽ More
The end of Dennard scaling combined with stagnation in architectural and compiler optimizations makes it challenging to achieve significant performance deltas. Solutions based solely in hardware or software are no longer sufficient to maintain the pace of improvements seen during the past few decades. In hardware, the end of single-core scaling resulted in the proliferation of multi-core system architectures, however this has forced complex parallel programming techniques into the mainstream. To further exploit physical resources, systems are becoming increasingly heterogeneous with specialized computing elements and accelerators. Programming across a range of disparate architectures requires a new level of abstraction that programming languages will have to adapt to. In software, emerging complex applications, from domains such as Big Data and computer vision, run on multi-layered software stacks targeting hardware with a variety of constraints and resources. Hence, optimizing for the power-performance (and resiliency) space requires experimentation platforms that offer quick and easy prototy** of hardware/software co-designed techniques. To that end, we present Project Beehive: A Hardware/Software co-designed stack for runtime and architectural research. Project Beehive utilizes various state-of-the-art software and hardware components along with novel and extensible co-design techniques. The objective of Project Beehive is to provide a modern platform for experimentation on emerging applications, programming languages, compilers, runtimes, and low-power heterogeneous many-core architectures in a full-system co-designed manner.
△ Less
Submitted 5 June, 2017; v1 submitted 14 September, 2015;
originally announced September 2015.
-
Automatic Detection of Performance Anomalies in Task-Parallel Programs
Authors:
Andi Drebes,
Karine Heydemann,
Antoniu Pop,
Albert Cohen,
Nathalie Drach
Abstract:
To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained tasks. Task-parallel languages, such as OpenStream, X10, Habanero Java and C or StarSs, simplify the development of applications for new architectures, but tun…
▽ More
To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained tasks. Task-parallel languages, such as OpenStream, X10, Habanero Java and C or StarSs, simplify the development of applications for new architectures, but tuning task-parallel applications remains a major challenge. Performance bottlenecks can occur at any level of the implementation, from the algorithmic level (e.g., lack of parallelism or over-synchronization), to interactions with the operating and runtime systems (e.g., data placement on NUMA architectures), to inefficient use of the hardware (e.g., frequent cache misses or misaligned memory accesses); detecting such issues and determining the exact cause is a difficult task.
In previous work, we developed Aftermath, an interactive tool for trace-based performance analysis and debugging of task-parallel programs and run-time systems. In contrast to other trace-based analysis tools, such as Paraver or Vampir, Aftermath offers native support for tasks, i.e., visualization, statistics and analysis tools adapted for performance debugging at task granularity. However, the tool currently does not provide support for the automatic detection of performance bottlenecks and it is up to the user to investigate the relevant aspects of program execution by focusing the inspection on specific slices of a trace file. In this paper, we present ongoing work on two extensions that guide the user through this process.
△ Less
Submitted 12 May, 2014;
originally announced May 2014.