-
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
Authors:
Matan Rusanovsky,
Or Hirschorn,
Shai Avidan
Abstract:
Conventional 2D pose estimation models are constrained by their design to specific object categories. This limits their applicability to predefined objects. To overcome these limitations, category-agnostic pose estimation (CAPE) emerged as a solution. CAPE aims to facilitate keypoint localization for diverse object categories using a unified model, which can generalize from minimal annotated suppo…
▽ More
Conventional 2D pose estimation models are constrained by their design to specific object categories. This limits their applicability to predefined objects. To overcome these limitations, category-agnostic pose estimation (CAPE) emerged as a solution. CAPE aims to facilitate keypoint localization for diverse object categories using a unified model, which can generalize from minimal annotated support images. Recent CAPE works have produced object poses based on arbitrary keypoint definitions annotated on a user-provided support image. Our work departs from conventional CAPE methods, which require a support image, by adopting a text-based approach instead of the support image. Specifically, we use a pose-graph, where nodes represent keypoints that are described with text. This representation takes advantage of the abstraction of text descriptions and the structure imposed by the graph.
Our approach effectively breaks symmetry, preserves structure, and improves occlusion handling. We validate our novel approach using the MP-100 benchmark, a comprehensive dataset spanning over 100 categories and 18,000 images. Under a 1-shot setting, our solution achieves a notable performance boost of 1.07\%, establishing a new state-of-the-art for CAPE. Additionally, we enrich the dataset by providing text description annotations, further enhancing its utility for future research.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
ScalSALE: Scalable SALE Benchmark Framework for Supercomputers
Authors:
Re'em Harel,
Matan Rusanovsky,
Ron Wagner,
Harel Levin,
Gal Oren
Abstract:
Supercomputers worldwide provide the necessary infrastructure for groundbreaking research. However, most supercomputers are not designed equally due to different desired figure of merit, which is derived from the computational bounds of the targeted scientific applications' portfolio. In turn, the design of such computers becomes an optimization process that strives to achieve the best performance…
▽ More
Supercomputers worldwide provide the necessary infrastructure for groundbreaking research. However, most supercomputers are not designed equally due to different desired figure of merit, which is derived from the computational bounds of the targeted scientific applications' portfolio. In turn, the design of such computers becomes an optimization process that strives to achieve the best performances possible in a multi-parameters search space. Therefore, verifying and evaluating whether a supercomputer can achieve its desired goal becomes a tedious and complex task. For this purpose, many full, mini, proxy, and benchmark applications have been introduced in the attempt to represent scientific applications. Nevertheless, as these benchmarks are hard to expand, and most importantly, are over-simplified compared to scientific applications that tend to couple multiple scientific domains, they fail to represent the true scaling capabilities. We suggest a new physical scalable benchmark framework, namely ScalSALE, based on the well-known SALE scheme. ScalSALE's main goal is to provide a simple, flexible, scalable infrastructure that can be easily expanded to include multi-physical schemes while maintaining scalable and efficient execution times. By expanding ScalSALE, the gap between the over-simplified benchmarks and scientific applications can be bridged. To achieve this goal, ScalSALE is implemented in Modern Fortran with simple OOP design patterns and supported by transparent MPI-3 blocking and non-blocking communication that allows such a scalable framework. ScalSALE is compared to LULESH via simulating the Sedov-Taylor blast wave problem using strong and weak scaling tests. ScalSALE is executed and evaluated with both rezoning options - Lagrangian and Eulerian.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Determining HEDP Foams' Quality with Multi-View Deep Learning Classification
Authors:
Nadav Schneider,
Matan Rusanovsky,
Raz Gvishi,
Gal Oren
Abstract:
High energy density physics (HEDP) experiments commonly involve a dynamic wave-front propagating inside a low-density foam. This effect affects its density and hence, its transparency. A common problem in foam production is the creation of defective foams. Accurate information on their dimension and homogeneity is required to classify the foams' quality. Therefore, those parameters are being chara…
▽ More
High energy density physics (HEDP) experiments commonly involve a dynamic wave-front propagating inside a low-density foam. This effect affects its density and hence, its transparency. A common problem in foam production is the creation of defective foams. Accurate information on their dimension and homogeneity is required to classify the foams' quality. Therefore, those parameters are being characterized using a 3D-measuring laser confocal microscope. For each foam, five images are taken: two 2D images representing the top and bottom surface foam planes and three images of side cross-sections from 3D scannings. An expert has to do the complicated, harsh, and exhausting work of manually classifying the foam's quality through the image set and only then determine whether the foam can be used in experiments or not. Currently, quality has two binary levels of normal vs. defective. At the same time, experts are commonly required to classify a sub-class of normal-defective, i.e., foams that are defective but might be sufficient for the needed experiment. This sub-class is problematic due to inconclusive judgment that is primarily intuitive. In this work, we present a novel state-of-the-art multi-view deep learning classification model that mimics the physicist's perspective by automatically determining the foams' quality classification and thus aids the expert. Our model achieved 86\% accuracy on upper and lower surface foam planes and 82\% on the entire set, suggesting interesting heuristics to the problem. A significant added value in this work is the ability to regress the foam quality instead of binary deduction and even explain the decision visually. The source code used in this work, as well as other relevant sources, are available at: https://github.com/Scientific-Computing-Lab-NRCN/Multi-View-Foams.git
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
ChangeChip: A Reference-Based Unsupervised Change Detection for PCB Defect Detection
Authors:
Yehonatan Fridman,
Matan Rusanovsky,
Gal Oren
Abstract:
The usage of electronic devices increases, and becomes predominant in most aspects of life. Surface Mount Technology (SMT) is the most common industrial method for manufacturing electric devices in which electrical components are mounted directly onto the surface of a Printed Circuit Board (PCB). Although the expansion of electronic devices affects our lives in a productive way, failures or defect…
▽ More
The usage of electronic devices increases, and becomes predominant in most aspects of life. Surface Mount Technology (SMT) is the most common industrial method for manufacturing electric devices in which electrical components are mounted directly onto the surface of a Printed Circuit Board (PCB). Although the expansion of electronic devices affects our lives in a productive way, failures or defects in the manufacturing procedure of those devices might also be counterproductive and even harmful in some cases. It is therefore desired and sometimes crucial to ensure zero-defect quality in electronic devices and their production. While traditional Image Processing (IP) techniques are not sufficient to produce a complete solution, other promising methods like Deep Learning (DL) might also be challenging for PCB inspection, mainly because such methods require big adequate datasets which are missing, not available or not updated in the rapidly growing field of PCBs. Thus, PCB inspection is conventionally performed manually by human experts. Unsupervised Learning (UL) methods may potentially be suitable for PCB inspection, having learning capabilities on the one hand, while not relying on large datasets on the other. In this paper, we introduce ChangeChip, an automated and integrated change detection system for defect detection in PCBs, from soldering defects to missing or misaligned electronic elements, based on Computer Vision (CV) and UL. We achieve good quality defect detection by applying an unsupervised change detection between images of a golden PCB (reference) and the inspected PCB under various setting. In this work, we also present CD-PCB, a synthesized labeled dataset of 20 pairs of PCB images for evaluation of defect detection algorithms.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing
Authors:
Yehonatan Fridman,
Yaniv Snir,
Matan Rusanovsky,
Kfir Zvi,
Harel Levin,
Danny Hendler,
Hagit Attiya,
Gal Oren
Abstract:
As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute node…
▽ More
As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute nodes, or some portion of these data objects must be stored in a non-volatile block device such as a hard disk drive or an SSD storage device. Optane DataCenter Persistent Memory Module (DCPMM), a new technology introduced by Intel, provides non-volatile memory that can be plugged into standard memory bus slots and therefore be accessed much faster than standard storage devices. In this work, we present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can 1) replace standard storage devices, and 2) replace or augment DRAM for improving the performance of HPC scientific computations. To achieve this goal, we have configured an HPC system such that DCPMM can service I/O operations of scientific applications, replace standard storage devices and file systems (specifically for diagnostics and checkpoint-restarting), and serve for expanding applications' main memory. We focus on kee** the scientific codes with as few changes as possible, while allowing them to access the NVM transparently as if they access persistent storage. Our results show that DCPMM allows scientific applications to fully utilize nodes' locality by providing them with sufficiently-large main memory. Moreover, it can be used for providing a high-performance replacement for persistent storage. Thus, the usage of DCPMM has the potential of replacing standard HDD and SSD storage devices in HPC architectures and enabling a more efficient platform for modern supercomputing applications.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
Execution of NVRAM Programs with Persistent Stack
Authors:
Vitaly Aksenov,
Ohad Ben-Baruch,
Danny Hendler,
Ilya Kokorin,
Matan Rusanovsky
Abstract:
Non-Volatile Random Access Memory (NVRAM) is a novel type of hardware that combines the benefits of traditional persistent memory (persistency of data over hardware failures) and DRAM (fast random access). In this work, we describe an algorithm that can be used to execute NVRAM programs and recover the system after a hardware failure while taking the architecture of real-world NVRAM systems into a…
▽ More
Non-Volatile Random Access Memory (NVRAM) is a novel type of hardware that combines the benefits of traditional persistent memory (persistency of data over hardware failures) and DRAM (fast random access). In this work, we describe an algorithm that can be used to execute NVRAM programs and recover the system after a hardware failure while taking the architecture of real-world NVRAM systems into account. Moreover, the algorithm can be used to execute NVRAM-destined programs on commodity persistent hardware, such as hard drives. That allows us to test NVRAM algorithms using only cheap hardware, without having access to the NVRAM. We report the usage of our algorithm to implement and test NVRAM CAS algorithm.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
An End-to-End Computer Vision Methodology for Quantitative Metallography
Authors:
Matan Rusanovsky,
Ofer Beeri,
Gal Oren
Abstract:
Metallography is crucial for a proper assessment of material's properties. It involves mainly the investigation of spatial distribution of grains and the occurrence and characteristics of inclusions or precipitates. This work presents an holistic artificial intelligence model for Anomaly Detection that automatically quantifies the degree of anomaly of impurities in alloys. We suggest the following…
▽ More
Metallography is crucial for a proper assessment of material's properties. It involves mainly the investigation of spatial distribution of grains and the occurrence and characteristics of inclusions or precipitates. This work presents an holistic artificial intelligence model for Anomaly Detection that automatically quantifies the degree of anomaly of impurities in alloys. We suggest the following examination process: (1) Deep semantic segmentation is performed on the inclusions (based on a suitable metallographic database of alloys and corresponding tags of inclusions), producing inclusions masks that are saved into a separated database. (2) Deep image inpainting is performed to fill the removed inclusions parts, resulting in 'clean' metallographic images, which contain the background of grains. (3) Grains' boundaries are marked using deep semantic segmentation (based on another metallographic database of alloys), producing boundaries that are ready for further inspection on the distribution of grains' size. (4) Deep anomaly detection and pattern recognition is performed on the inclusions masks to determine spatial, shape and area anomaly detection of the inclusions. Finally, the system recommends to an expert on areas of interests for further examination. The performance of the model is presented and analyzed based on few representative cases. Although the models presented here were developed for metallography analysis, most of them can be generalized to a wider set of problems in which anomaly detection of geometrical objects is desired. All models as well as the data-sets that were created for this work, are publicly available at https://github.com/Scientific-Computing-Lab-NRCN/MLography.
△ Less
Submitted 1 March, 2022; v1 submitted 22 April, 2021;
originally announced April 2021.
-
Flat-Combining-Based Persistent Data Structures for Non-Volatile Memory
Authors:
Matan Rusanovsky,
Hagit Attiya,
Ohad Ben-Baruch,
Tom Gerby,
Danny Hendler,
Pedro Ramalhete
Abstract:
Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is sequential, it significantly reduces synchronization overheads and cache invalidations and thus often provides better performance than that of lock-free implementa…
▽ More
Flat combining (FC) is a synchronization paradigm in which a single thread, holding a global lock, collects requests by multiple threads for accessing a concurrent data structure and applies their combined requests to it. Although FC is sequential, it significantly reduces synchronization overheads and cache invalidations and thus often provides better performance than that of lock-free implementations. The recent emergence of non-volatile memory (NVM) technologies increases the interest in the development of persistent concurrent objects. These are objects that are able to recover from system failures and ensure consistency by retaining their state in NVM and fixing it, if required, upon recovery. Of particular interest are detectable objects that, in addition to ensuring consistency, allow recovery code to infer if a failed operation took effect before the crash and, if it did, obtain its response. In this work, we present the first FC-based persistent object implementations. Specifically, we introduce a detectable FC-based implementation of a concurrent LIFO stack, a concurrent FIFO queue, and a double-ended queue. Our empirical evaluation establishes that due to flat combining, the novel implementations require a much smaller number of costly persistence instructions than competing algorithms and are therefore able to significantly outperform them.
△ Less
Submitted 8 December, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Complete CVDL Methodology for Investigating Hydrodynamic Instabilities
Authors:
Re'em Harel,
Matan Rusanovsky,
Yehonatan Fridman,
Assaf Shimony,
Gal Oren
Abstract:
In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with the highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomenon - namely analytical models, experiments and simulations - and all of them are primarily investig…
▽ More
In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with the highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomenon - namely analytical models, experiments and simulations - and all of them are primarily investigated and correlated using human expertise. In this work we claim and demonstrate that a major portion of this research effort could and should be analysed using recent breakthrough advancements in the field of Computer Vision with Deep Learning (CVDL, or Deep Computer-Vision). Specifically, we target and evaluate specific state-of-the-art techniques - such as Image Retrieval, Template Matching, Parameters Regression and Spatiotemporal Prediction - for the quantitative and qualitative benefits they provide. In order to do so we focus in this research on one of the most representative instabilities, the Rayleigh-Taylor one, simulate its behaviour and create an open-sourced state-of-the-art annotated database (RayleAI). Finally, we use adjusted experimental results and novel physical loss methodologies to validate the correspondence of the predicted results to actual physical reality to prove the models efficiency. The techniques which were developed and proved in this work can be served as essential tools for physicists in the field of hydrodynamics for investigating a variety of physical systems, and also could be used via Transfer Learning to other instabilities research. A part of the techniques can be easily applied on already exist simulation results. All models as well as the data-set that was created for this work, are publicly available at: https://github.com/scientific-computing-nrcn/SimulAI.
△ Less
Submitted 26 April, 2020; v1 submitted 3 April, 2020;
originally announced April 2020.
-
MLography: An Automated Quantitative Metallography Model for Impurities Anomaly Detection using Novel Data Mining and Deep Learning Approach
Authors:
Matan Rusanovsky,
Gal Oren,
Sigalit Ifergane,
Ofer Beeri
Abstract:
The micro-structure of most of the engineering alloys contains some inclusions and precipitates, which may affect their properties, therefore it is crucial to characterize them. In this work we focus on the development of a state-of-the-art artificial intelligence model for Anomaly Detection named MLography to automatically quantify the degree of anomaly of impurities in alloys. For this purpose,…
▽ More
The micro-structure of most of the engineering alloys contains some inclusions and precipitates, which may affect their properties, therefore it is crucial to characterize them. In this work we focus on the development of a state-of-the-art artificial intelligence model for Anomaly Detection named MLography to automatically quantify the degree of anomaly of impurities in alloys. For this purpose, we introduce several anomaly detection measures: Spatial, Shape and Area anomaly, that successfully detect the most anomalous objects based on their objective, given that the impurities were already labeled. The first two measures quantify the degree of anomaly of each object by how each object is distant and big compared to its neighborhood, and by the abnormally of its own shape respectively. The last measure, combines the former two and highlights the most anomalous regions among all input images, for later (physical) examination. The performance of the model is presented and analyzed based on few representative cases. We stress that although the models presented here were developed for metallography analysis, most of them can be generalized to a wider set of problems in which anomaly detection of geometrical objects is desired. All models as well as the data-set that was created for this work, are publicly available at: https://github.com/matanr/MLography.
△ Less
Submitted 27 February, 2020;
originally announced March 2020.
-
Upper and Lower Bounds on the Space Complexity of Detectable Object
Authors:
Ohad Ben-Baruch,
Danny Hendler,
Matan Rusanovsky
Abstract:
The emergence of systems with non-volatile main memory (NVM) increases the interest in the design of \emph{recoverable concurrent objects} that are robust to crash-failures, since their operations are able to recover from such failures by using state retained in NVM. Of particular interest are recoverable algorithms that, in addition to ensuring object consistency, also provide \emph{detectability…
▽ More
The emergence of systems with non-volatile main memory (NVM) increases the interest in the design of \emph{recoverable concurrent objects} that are robust to crash-failures, since their operations are able to recover from such failures by using state retained in NVM. Of particular interest are recoverable algorithms that, in addition to ensuring object consistency, also provide \emph{detectability}, a correctness condition requiring that the recovery code can infer if the failed operation was linearized or not and, in the former case, obtain its response.
In this work, we investigate the space complexity of detectable algorithms and the external support they require. We make the following three contributions. First, we present the first wait-free bounded-space detectable read/write and CAS object implementations. Second, we prove that the bit complexity of every $N$-process obstruction-free detectable CAS implementation, assuming values from a domain of size at least $N$, is $Ω(N)$. Finally, we prove that the following holds for obstruction-free detectable implementations of a large class of objects: their recoverable operations must be provided with \emph{auxiliary state} -- state that is not required by the non-recoverable counterpart implementation -- whose value must be provided from outside the operation, either by the system or by the caller of the operation. In contrast, this external support is, in general, not required if the recoverable algorithm is not detectable.
△ Less
Submitted 26 February, 2020;
originally announced February 2020.
-
BACKUS: Comprehensive High-Performance Research Software Engineering Approach for Simulations in Supercomputing Systems
Authors:
Matan Rusanovsky,
Re'em Harel,
Lee-or Alon,
Idan Mosseri,
Harel Levin,
Gal Oren
Abstract:
High-Performance Computing (HPC) platforms enable scientific software to achieve breakthroughs in many research fields such as physics, biology, and chemistry, by employing Research Software Engineering (RSE) techniques. These include 1) novel parallelism paradigms such as Shared Memory Parallelism (with e.g. OpenMP 4.5); Distributed Memory Parallelism (with e.g. MPI 4); Hybrid Parallelism which c…
▽ More
High-Performance Computing (HPC) platforms enable scientific software to achieve breakthroughs in many research fields such as physics, biology, and chemistry, by employing Research Software Engineering (RSE) techniques. These include 1) novel parallelism paradigms such as Shared Memory Parallelism (with e.g. OpenMP 4.5); Distributed Memory Parallelism (with e.g. MPI 4); Hybrid Parallelism which combines them; and Heterogeneous Parallelism (for CPUs, co-processors and accelerators), 2) introducing advanced Software Engineering concepts such as Object Oriented Parallel Programming (OOPP); Parallel Unit testing; Parallel I/O Formats; Hybrid Parallel Visualization; and 3) Selecting the Best Practices in other necessary areas such as User Interface; Automatic Documentation; Version Control and Project Management. In this work we present BACKUS: Comprehensive High-Performance Research Software Engineering Approach for Simulations in Supercomputing Systems, which we found to fit best for long-lived parallel scientific codes.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.