Search | arXiv e-print repository

A Mess of Memory System Benchmarking, Simulation and Application Profiling

Authors: Pouya Esmaili-Dokht, Francesco Sgherzi, Valeria Soldera Girelli, Isaac Boixaderas, Mariana Carmin, Alireza Momeni, Adria Armejach, Estanislao Mercadal, German Llort, Petar Radojkovic, Miquel Moreto, Judit Gimenez, Xavier Martorell, Eduard Ayguade, Jesus Labarta, Emanuele Confalonieri, Rishabh Dubey, Jason Adlard

Abstract: The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds of measurements that are represented as a family of bandwidth--latency curves. The benchmark increases the coverage of all the previous tools and leads to new f… ▽ More The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds of measurements that are represented as a family of bandwidth--latency curves. The benchmark increases the coverage of all the previous tools and leads to new findings in the behavior of the actual and simulated memory systems. We deploy the Mess benchmark to characterize Intel, AMD, IBM, Fujitsu, Amazon and NVIDIA servers with DDR4, DDR5, HBM2 and HBM2E memory. The Mess memory simulator uses bandwidth--latency concept for the memory performance simulation. We integrate Mess with widely-used CPUs simulators enabling modeling of all high-end memory technologies. The Mess simulator is fast, easy to integrate and it closely matches the actual system performance. By design, it enables a quick adoption of new memory technologies in hardware simulators. Finally, the Mess application profiling positions the application in the bandwidth--latency space of the target memory system. This information can be correlated with other application runtime activities and the source code, leading to a better overall understanding of the application's behavior. The current Mess benchmark release covers all major CPU and GPU ISAs, x86, ARM, Power, RISC-V, and NVIDIA's PTX. We also release as open source the ZSim, gem5 and OpenPiton Metro-MPI integrated with the Mess memory simulator for DDR4, DDR5, Optane, HBM2, HBM2E and CXL memory expanders. The Mess application profiling is already integrated into a suite of production HPC performance analysis tools. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 17 pages

arXiv:2205.05214 [pdf, other]

A Unified f-divergence Framework Generalizing VAE and GAN

Authors: Jaime Roquero Gimenez, James Zou

Abstract: Develo** deep generative models that flexibly incorporate diverse measures of probability distance is an important area of research. Here we develop an unified mathematical framework of f-divergence generative model, f-GM, that incorporates both VAE and f-GAN, and enables tractable learning with general f-divergences. f-GM allows the experimenter to flexibly design the f-divergence function with… ▽ More Develo** deep generative models that flexibly incorporate diverse measures of probability distance is an important area of research. Here we develop an unified mathematical framework of f-divergence generative model, f-GM, that incorporates both VAE and f-GAN, and enables tractable learning with general f-divergences. f-GM allows the experimenter to flexibly design the f-divergence function without changing the structure of the networks or the learning procedure. f-GM jointly models three components: a generator, a inference network and a density estimator. Therefore it simultaneously enables sampling, posterior inference of the latent variable as well as evaluation of the likelihood of an arbitrary datum. f-GM belongs to the class of encoder-decoder GANs: our density estimator can be interpreted as playing the role of a discriminator between samples in the joint space of latent code and observed space. We prove that f-GM naturally simplifies to the standard VAE and to f-GAN as special cases, and illustrates the connections between different encoder-decoder GAN architectures. f-GM is compatible with general network architecture and optimizer. We leverage it to experimentally explore the effects -- e.g. mode collapse and image sharpness -- of different choices of f-divergence. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2005.05872 [pdf, other]

doi 10.1016/j.parco.2018.06.007

Understanding Memory Access Patterns Using the BSC Performance Tools

Authors: Harald Servat, Jesús Labarta, Hans-Christian Hoppe, Judit Giménez, Antonio J. Peña

Abstract: The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such divergence by taking advantage of the locality of reference. In this direction, the BSC performance analysis tools have been recently extended to provide insight relative to the application memory accesses depicting their temporal and spatial characteristics, correlating… ▽ More The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such divergence by taking advantage of the locality of reference. In this direction, the BSC performance analysis tools have been recently extended to provide insight relative to the application memory accesses depicting their temporal and spatial characteristics, correlating with the source-code and the achieved performance simultaneously. These extensions rely on the Precise Event-Based Sampling (PEBS) mechanism available in recent Intel processors to capture information regarding the application memory accesses. The sampled information is later combined with the Folding technique to represent a detailed temporal evolution of the memory accesses and in conjunction with the achieved performance and the source-code counterpart. The results obtained from the combination of these tools help not only application developers but also processor architects to understand better how the application behaves and how the system performs. In this paper, we describe a tighter integration of the sampling mechanism into the monitoring package. We also demonstrate the value of the complete workflow by exploring already optimized state--of--the--art benchmarks, providing detailed insight of their memory access behavior. We have taken advantage of this insight to apply small modifications that improve the applications' performance. △ Less

Submitted 28 May, 2020; v1 submitted 12 May, 2020; originally announced May 2020.

Journal ref: H. Servat, J. Labarta, H. C. Hoppe, J. Giménez, and A. J. Peña, "Understanding memory access patterns using the BSC performance tools", Parallel Computing, Elsevier, vol. 78, pp. 1-14, Oct. 2018

arXiv:2004.03261 [pdf]

doi 10.1109/TBC.2020.2985906

5G Radio Access Network Architecture for Terrestrial Broadcast Services

Authors: Mikko Säily, Carlos Barjau Estevan, Jordi Joan Gimenez, Fasil Tesema, Wei Guo, David Gomez-Barquero, De Mi

Abstract: The 3rd Generation Partnership Project (3GPP) has defined based on the Long Term Evolution (LTE) enhanced Multicast Broadcast Multimedia Service (eMBMS) a set of new features to support the distribution of Terrestrial Broadcast services in Release 14. On the other hand, a new 5th Generation (5G) system architecture and radio access technology, 5G New Radio (NR), are being standardised from Release… ▽ More The 3rd Generation Partnership Project (3GPP) has defined based on the Long Term Evolution (LTE) enhanced Multicast Broadcast Multimedia Service (eMBMS) a set of new features to support the distribution of Terrestrial Broadcast services in Release 14. On the other hand, a new 5th Generation (5G) system architecture and radio access technology, 5G New Radio (NR), are being standardised from Release 15 onwards, which so far have only focused on unicast connectivity. This may change in Release 17 given a new Work Item set to specify basic Radio Access Network (RAN) functionalities for the provision of multicast/broadcast communications for NR. This work initially excludes some of the functionalities originally supported for Terrestrial Broadcast services under LTE e.g. free to air, receive-only mode, large-area single frequency networks, etc. This paper proposes an enhanced Next Generation RAN architecture based on 3GPP Release 15 with a series of architectural and functional enhancements, to support an efficient, flexible and dynamic selection between unicast and multicast/broadcast transmission modes and also the delivery of Terrestrial Broadcast services. The paper elaborates on the Cloud-RAN based architecture and proposes new concepts such as the RAN Broadcast/Multicast Areas that allows a more flexible deployment in comparison to eMBMS. High-level assessment methodologies including complexity analysis and inspection are used to evaluate the feasibility of the proposed architecture design and compare it with the 3GPP architectural requirements. △ Less

Submitted 7 April, 2020; originally announced April 2020.

Comments: 12 pages, 10 figures, 2 tables, IEEE Trans. Broadcasting

arXiv:2004.00369 [pdf]

doi 10.1109/TBC.2020.2977546

Demonstrating Immersive Media Delivery on 5G Broadcast and Multicast Testing Networks

Authors: De Mi, Joe Eyles, Tero Jokela, Swen Petersen, Roman Odarchenko, Ece Ozturk, Duy-Kha Chau, Tuan Tran, Rory Turnbull, Heikki Kokkinen, Baruch Altman, Menno Bot, Darko Ratkaj, Olaf Renner, David Gomez-Barquero, Jordi Joan Gimenez

Abstract: This work presents eight demonstrators and one showcase developed within the 5G-Xcast project. They experimentally demonstrate and validate key technical enablers for the future of media delivery, associated with multicast and broadcast communication capabilities in 5th Generation (5G). In 5G-Xcast, three existing testbeds: IRT in Munich (Germany), 5GIC in Surrey (UK), and TUAS in Turku (Finland),… ▽ More This work presents eight demonstrators and one showcase developed within the 5G-Xcast project. They experimentally demonstrate and validate key technical enablers for the future of media delivery, associated with multicast and broadcast communication capabilities in 5th Generation (5G). In 5G-Xcast, three existing testbeds: IRT in Munich (Germany), 5GIC in Surrey (UK), and TUAS in Turku (Finland), have been developed into 5G broadcast and multicast testing networks, which enables us to demonstrate our vision of a converged 5G infrastructure with fixed and mobile accesses and terrestrial broadcast, delivering immersive audio-visual media content. Built upon the improved testing networks, the demonstrators and showcase developed in 5G-Xcast show the impact of the technology developed in the project. Our demonstrations predominantly cover use cases belonging to two verticals: Media & Entertainment and Public Warning, which are future 5G scenarios relevant to multicast and broadcast delivery. In this paper, we present the development of these demonstrators, the showcase, and the testbeds. We also provide key findings from the experiments and demonstrations, which not only validate the technical solutions developed in the project, but also illustrate the potential technical impact of these solutions for broadcasters, content providers, operators, and other industries interested in the future immersive media delivery. △ Less

Submitted 1 March, 2020; originally announced April 2020.

Comments: 16 pages, 22 figures, IEEE Trans. Broadcasting

arXiv:1905.12177 [pdf, other]

Discovering Conditionally Salient Features with Statistical Guarantees

Authors: Jaime Roquero Gimenez, James Zou

Abstract: The goal of feature selection is to identify important features that are relevant to explain an outcome variable. Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to the outcome using evidence across the entire dataset. We study a more fine-grained statistical problem: conditional feature selection, where a feature may be re… ▽ More The goal of feature selection is to identify important features that are relevant to explain an outcome variable. Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to the outcome using evidence across the entire dataset. We study a more fine-grained statistical problem: conditional feature selection, where a feature may be relevant depending on the values of the other features. For example in genetic association studies, variant $A$ could be associated with the phenotype in the entire dataset, but conditioned on variant $B$ being present it might be independent of the phenotype. In this sense, variant $A$ is globally relevant, but conditioned on $B$ it is no longer locally relevant in that region of the feature space. We present a generalization of the knockoff procedure that performs conditional feature selection while controlling a generalization of the false discovery rate (FDR) to the conditional setting. By exploiting the feature/response model-free framework of the knockoffs, the quality of the statistical FDR guarantee is not degraded even when we perform conditional feature selections. We implement this method and present an algorithm that automatically partitions the feature space such that it enhances the differences between selected sets in different regions, and validate the statistical theoretical results with experiments. △ Less

Submitted 28 May, 2019; originally announced May 2019.

Comments: Accepted at ICML 2019

arXiv:1810.11378 [pdf, other]

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization

Authors: Jaime Roquero Gimenez, James Zou

Abstract: The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees. The advantage of knockoff is that if we have a good model of the features X, then we can identify salient features without knowing anything about how the outcome Y depends on X. An important drawback of knockoffs is its instability: running the procedure twice can result in… ▽ More The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees. The advantage of knockoff is that if we have a good model of the features X, then we can identify salient features without knowing anything about how the outcome Y depends on X. An important drawback of knockoffs is its instability: running the procedure twice can result in very different selected features, potentially leading to different conclusions. Addressing this instability is critical for obtaining reproducible and robust results. Here we present a generalization of the knockoff procedure that we call simultaneous multi-knockoffs. We show that multi-knockoff guarantees false discovery rate (FDR) control, and is substantially more stable and powerful compared to the standard (single) knockoff. Moreover we propose a new algorithm based on entropy maximization for generating Gaussian multi-knockoffs. We validate the improved stability and power of multi-knockoffs in systematic experiments. We also illustrate how multi-knockoffs can improve the accuracy of detecting genetic mutations that are causally linked to phenotypes. △ Less

Submitted 28 May, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

Comments: Accepted at AISTATS 2019

arXiv:1807.06214 [pdf, other]

Knockoffs for the mass: new feature importance statistics with false discovery guarantees

Authors: Jaime Roquero Gimenez, Amirata Ghorbani, James Zou

Abstract: An important problem in machine learning and statistics is to identify features that causally affect the outcome. This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features. For example, we want to identify that smoking really is correlated with cancer condition… ▽ More An important problem in machine learning and statistics is to identify features that causally affect the outcome. This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features. For example, we want to identify that smoking really is correlated with cancer conditioned on demographics. The knockoff procedure is a recent breakthrough in statistics that, in theory, can identify truly correlated features while guaranteeing that the false discovery is limited. The idea is to create synthetic data -- knockoffs -- that captures correlations amongst the features. However there are substantial computational and practical challenges to generating and using knockoffs. This paper makes several key advances that enable knockoff application to be more efficient and powerful. We develop an efficient algorithm to generate valid knockoffs from Bayesian Networks. Then we systematically evaluate knockoff test statistics and develop new statistics with improved power. The paper combines new mathematical guarantees with systematic experiments on real and synthetic data. △ Less

Submitted 28 May, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: Accepted at AISTATS 2019

arXiv:1511.05807 [pdf]

Development of Wireless Techniques in Data and Power Transmission - Application for Particle Physics Detectors

Authors: R. Brenner, S. Ceuterickx, C. Dehos, P. De Lurgio, Z. Djurcic, G. Drake, J. L. Gonzalez Gimenez, L. Gustafsson, D. W. Kim, E. Locci, D. Roehrich, A. Schoening, A. Siligaris, H. K. Soltveit, K. Ullaland, P. Vincent, D. Wiednert, S. Yang

Abstract: Wireless techniques have developed extremely fast over the last decade and using them for data and power transmission in particle physics detectors is not science- fiction any more. During the last years several research groups have independently thought of making it a reality. Wireless techniques became a mature field for research and new developments might have impact on future particle physics… ▽ More Wireless techniques have developed extremely fast over the last decade and using them for data and power transmission in particle physics detectors is not science- fiction any more. During the last years several research groups have independently thought of making it a reality. Wireless techniques became a mature field for research and new developments might have impact on future particle physics experiments. The Instrumentation Frontier was set up as a part of the SnowMass 2013 Community Summer Study [1] to examine the instrumentation R&D for the particle physics research over the coming decades: « To succeed we need to make technical and scientific innovation a priority in the field ». Wireless data transmission was identified as one of the innovations that could revolutionize the transmission of data out of the detector. Power delivery was another challenge mentioned in the same report. We propose a collaboration to identify the specific needs of different projects that might benefit from wireless techniques. The objective is to provide a common platform for research and development in order to optimize effectiveness and cost, with the aim of designing and testing wireless demonstrators for large instrumentation systems. △ Less

Submitted 18 November, 2015; originally announced November 2015.

arXiv:1307.2971 [pdf, other]

Accuracy of MAP segmentation with hidden Potts and Markov mesh prior models via Path Constrained Viterbi Training, Iterated Conditional Modes and Graph Cut based algorithms

Authors: Ana Georgina Flesia, Josef Baumgartner, Javier Gimenez, Jorge Martinez

Abstract: In this paper, we study statistical classification accuracy of two different Markov field environments for pixelwise image segmentation, considering the labels of the image as hidden states and solving the estimation of such labels as a solution of the MAP equation. The emission distribution is assumed the same in all models, and the difference lays in the Markovian prior hypothesis made over the… ▽ More In this paper, we study statistical classification accuracy of two different Markov field environments for pixelwise image segmentation, considering the labels of the image as hidden states and solving the estimation of such labels as a solution of the MAP equation. The emission distribution is assumed the same in all models, and the difference lays in the Markovian prior hypothesis made over the labeling random field. The a priori labeling knowledge will be modeled with a) a second order anisotropic Markov Mesh and b) a classical isotropic Potts model. Under such models, we will consider three different segmentation procedures, 2D Path Constrained Viterbi training for the Hidden Markov Mesh, a Graph Cut based segmentation for the first order isotropic Potts model, and ICM (Iterated Conditional Modes) for the second order isotropic Potts model. We provide a unified view of all three methods, and investigate goodness of fit for classification, studying the influence of parameter estimation, computational gain, and extent of automation in the statistical measures Overall Accuracy, Relative Improvement and Kappa coefficient, allowing robust and accurate statistical analysis on synthetic and real-life experimental data coming from the field of Dental Diagnostic Radiography. All algorithms, using the learned parameters, generate good segmentations with little interaction when the images have a clear multimodal histogram. Suboptimal learning proves to be frail in the case of non-distinctive modes, which limits the complexity of usable models, and hence the achievable error rate as well. All Matlab code written is provided in a toolbox available for download from our website, following the Reproducible Research Paradigm. △ Less

Submitted 11 July, 2013; originally announced July 2013.

arXiv:1304.7713 [pdf, other]

Markovian models for one dimensional structure estimation on heavily noisy imagery

Authors: Ana Georgina Flesia, Javier Gimenez, Elena Rufeil Fiori

Abstract: Radar (SAR) images often exhibit profound appearance variations due to a variety of factors including clutter noise produced by the coherent nature of the illumination. Ultrasound images and infrared images have similar cluttered appearance, that make 1 dimensional structures, as edges and object boundaries difficult to locate. Structure information is usually extracted in two steps: first, buildi… ▽ More Radar (SAR) images often exhibit profound appearance variations due to a variety of factors including clutter noise produced by the coherent nature of the illumination. Ultrasound images and infrared images have similar cluttered appearance, that make 1 dimensional structures, as edges and object boundaries difficult to locate. Structure information is usually extracted in two steps: first, building and edge strength mask classifying pixels as edge points by hypothesis testing, and secondly estimating from that mask, pixel wide connected edges. With constant false alarm rate (CFAR) edge strength detectors for speckle clutter, the image needs to be scanned by a sliding window composed of several differently oriented splitting sub-windows. The accuracy of edge location for these ratio detectors depends strongly on the orientation of the sub-windows. In this work we propose to transform the edge strength detection problem into a binary segmentation problem in the undecimated wavelet domain, solvable using parallel 1d Hidden Markov Models. For general dependency models, exact estimation of the state map becomes computationally complex, but in our model, exact MAP is feasible. The effectiveness of our approach is demonstrated on simulated noisy real-life natural images with available ground truth, while the strength of our output edge map is measured with Pratt's, Baddeley an Kappa proficiency measures. Finally, analysis and experiments on three different types of SAR images, with different polarizations, resolutions and textures, illustrate that the proposed method can detect structure on SAR images effectively, providing a very good start point for active contour methods. △ Less

Submitted 29 April, 2013; originally announced April 2013.

arXiv:1302.5186 [pdf, ps, other]

Unsupervised edge map scoring: a statistical complexity approach

Authors: Javier Gimenez, Jorge Martinez, Ana Georgina Flesia

Abstract: We propose a new Statistical Complexity Measure (SCM) to qualify edge maps without Ground Truth (GT) knowledge. The measure is the product of two indices, an \emph{Equilibrium} index $\mathcal{E}$ obtained by projecting the edge map into a family of edge patterns, and an \emph{Entropy} index $\mathcal{H}$, defined as a function of the Kolmogorov Smirnov (KS) statistic. This new measure can be us… ▽ More We propose a new Statistical Complexity Measure (SCM) to qualify edge maps without Ground Truth (GT) knowledge. The measure is the product of two indices, an \emph{Equilibrium} index $\mathcal{E}$ obtained by projecting the edge map into a family of edge patterns, and an \emph{Entropy} index $\mathcal{H}$, defined as a function of the Kolmogorov Smirnov (KS) statistic. This new measure can be used for performance characterization which includes: (i)~the specific evaluation of an algorithm (intra-technique process) in order to identify its best parameters, and (ii)~the comparison of different algorithms (inter-technique process) in order to classify them according to their quality. Results made over images of the South Florida and Berkeley databases show that our approach significantly improves over Pratt's Figure of Merit (PFoM) which is the objective reference-based edge map evaluation standard, as it takes into account more features in its evaluation. △ Less

Submitted 10 February, 2014; v1 submitted 21 February, 2013; originally announced February 2013.

Showing 1–12 of 12 results for author: Giménez, J