-
Noise-Aware Training of Layout-Aware Language Models
Authors:
Ritesh Sarkhel,
Xiaoqi Ren,
Lauro Beltrao Costa,
Guolong Su,
Vincent Perot,
Yanan Xie,
Emmanouil Koukoumidis,
Arnab Nandi
Abstract:
A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities. This is an expensive bottleneck in enterprise scenarios, where we want to train custom extractors for tho…
▽ More
A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities. This is an expensive bottleneck in enterprise scenarios, where we want to train custom extractors for thousands of different document types in a scalable way. Pre-training an extractor model on unlabeled instances of the target document type, followed by a fine-tuning step on human-labeled instances does not work in these scenarios, as it surpasses the maximum allowable training time allocated for the extractor. We address this scenario by proposing a Noise-Aware Training method or NAT in this paper. Instead of acquiring expensive human-labeled documents, NAT utilizes weakly labeled documents to train an extractor in a scalable way. To avoid degradation in the model's quality due to noisy, weakly labeled samples, NAT estimates the confidence of each training sample and incorporates it as uncertainty measure during training. We train multiple state-of-the-art extractor models using NAT. Experiments on a number of publicly available and in-house datasets show that NAT-trained models are not only robust in performance -- it outperforms a transfer-learning baseline by up to 6% in terms of macro-F1 score, but it is also more label-efficient -- it reduces the amount of human-effort required to obtain comparable performance by up to 73%.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
vSDNEmul: A Software-Defined Network Emulator Based on Container Virtualization
Authors:
Fernando N. N. Farias,
Antônio de O. Junior,
Leonardo B. da Costa,
Billy A. Pinheiro,
Antônio J. G. Abelém
Abstract:
The main issue related to Software-Defined Network emulators is how to replicate real behavior in experiments. Mininet and others SDN emulators have an architecture that limits both the scope of experiments and the fidelity of networking tests. Consequently, the serialization, contention, and load of background processes may produce delays that compromise the operation of events such as transmitti…
▽ More
The main issue related to Software-Defined Network emulators is how to replicate real behavior in experiments. Mininet and others SDN emulators have an architecture that limits both the scope of experiments and the fidelity of networking tests. Consequently, the serialization, contention, and load of background processes may produce delays that compromise the operation of events such as transmitting a packet or completing a computation, possibly invalidating the performance evaluation of a network emulation. To address these problems, this paper presents vSDNEmul, a network emulator based on Docker container virtualization. Different from Mininet, vSDNEmul isolates each node in a container and interconnects the nodes through virtual or tunnel links. By using containers, vSDNEmul allows autonomous and flexible creation of independent network elements, resulting in more realistic emulations. This paper reports performance evaluations comparing vSDNEmul and Mininet. The results obtained with the vSDNEmul emulator are more realistic and present higher accuracy.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems
Authors:
Abdullah Gharaibeh,
Tahsin Reza,
Elizeu Santos-Neto,
Lauro Beltrao Costa,
Scott Sallinen,
Matei Ripeanu
Abstract:
The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, but also most graph algorithms entail memory access…
▽ More
The increasing scale and wealth of inter-connected data, such as those accrued by social network applications, demand the design of new techniques and platforms to efficiently derive actionable knowledge from large-scale graphs. However, real-world graphs are famously difficult to process efficiently. Not only they have a large memory footprint, but also most graph algorithms entail memory access patterns with poor locality, data-dependent parallelism and a low compute-to-memory access ratio. Moreover, most real-world graphs have a highly heterogeneous node degree distribution, hence partitioning these graphs for parallel processing and simultaneously achieving access locality and load-balancing is difficult.
This work starts from the hypothesis that hybrid platforms (e.g., GPU-accelerated systems) have both the potential to cope with the heterogeneous structure of real graphs and to offer a cost-effective platform for high-performance graph processing. This work assesses this hypothesis and presents an extensive exploration of the opportunity to harness hybrid systems to process large-scale graphs efficiently. In particular, (i) we present a performance model that estimates the achievable performance on hybrid platforms; (ii) informed by the performance model, we design and develop TOTEM - a processing engine that provides a convenient environment to implement graph algorithms on hybrid platforms; (iii) we show that further performance gains can be extracted using partitioning strategies that aim to produce partitions that each matches the strengths of the processing element it is allocated to, finally, (iv) we demonstrate the performance advantages of the hybrid system through a comprehensive evaluation that uses real and synthetic workloads (as large as 16 billion edges), multiple graph algorithms that stress the system in various ways, and a variety of hardware configurations.
△ Less
Submitted 5 December, 2014; v1 submitted 10 December, 2013;
originally announced December 2013.
-
Predicting Intermediate Storage Performance for Workflow Applications
Authors:
Lauro Beltrão Costa,
Abmar Barros,
Samer Al-Kiswany,
Hao Yang,
Emalayan Vairavanathan,
Matei Ripeanu
Abstract:
Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the…
▽ More
Configuring a storage system to better serve an application is a challenging task complicated by a multidimensional, discrete configuration space and the high cost of space exploration (e.g., by running the application with different storage configurations). To enable selecting the best configuration in a reasonable time, we design an end-to-end performance prediction mechanism that estimates the turn-around time of an application using storage system under a given configuration. This approach focuses on a generic object-based storage system design, supports exploring the impact of optimizations targeting workflow applications (e.g., various data placement schemes) in addition to other, more traditional, configuration knobs (e.g., stripe size or replication level), and models the system operation at data-chunk and control message level.
This paper presents our experience to date with designing and using this prediction mechanism. We evaluate this mechanism using micro- as well as synthetic benchmarks mimicking real workflow applications, and a real application.. A preliminary evaluation shows that we are on a good track to meet our objectives: it can scale to model a workflow application run on an entire cluster while offering an over 200x speedup factor (normalized by resource) compared to running the actual application, and can achieve, in the limited number of scenarios we study, a prediction accuracy that enables identifying the best storage system configuration.
△ Less
Submitted 10 June, 2013; v1 submitted 19 February, 2013;
originally announced February 2013.
-
The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System
Authors:
Samer Al-Kiswany,
Emalayan Vairavanathan,
Lauro B. Costa,
Hao Yang,
Matei Ripeanu
Abstract:
This paper proposes using file system custom metadata as a bidirectional communication channel between applications and the storage system. This channel can be used to pass hints that enable cross-layer optimizations, an option hindered today by the ossified file-system interface. We study this approach in context of storage system support for large-scale workflow execution systems: Our workflow o…
▽ More
This paper proposes using file system custom metadata as a bidirectional communication channel between applications and the storage system. This channel can be used to pass hints that enable cross-layer optimizations, an option hindered today by the ossified file-system interface. We study this approach in context of storage system support for large-scale workflow execution systems: Our workflow optimized storage system (WOSS), exploits application hints to provide per-file optimized operations, and exposes data location to enable location-aware scheduling.
This paper argues that an incremental adoption path for adopting cross-layer optimizations in storage systems exists, presents the system architecture for a workflow-optimized storage system and its integration with a workflow runtime engine, and evaluates the proposed approach using synthetic as well as real applications workloads.
△ Less
Submitted 25 January, 2013;
originally announced January 2013.