Search | arXiv e-print repository

REDUCR: Robust Data Downsampling Using Class Priority Reweighting

Authors: William Bankes, George Hughes, Ilija Bogunovic, Zi Wang

Abstract: Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, these techniques can suffer from poor worst-class generalization perfor… ▽ More Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, these techniques can suffer from poor worst-class generalization performance due to class imbalance and distributional shifts. This work introduces REDUCR, a robust and efficient data downsampling method that uses class priority reweighting. REDUCR reduces the training data while preserving worst-class generalization performance. REDUCR assigns priority weights to datapoints in a class-aware manner using an online learning algorithm. We demonstrate the data efficiency and robust performance of REDUCR on vision and text classification tasks. On web-scraped datasets with imbalanced class distributions, REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Preprint

arXiv:2310.00091 [pdf, other]

Towards Automated Accessibility Report Generation for Mobile Apps

Authors: Amanda Swearngin, Jason Wu, Xiaoyi Zhang, Esteban Gomez, Jen Coughenour, Rachel Stukenborg, Bhavya Garg, Greg Hughes, Adriana Hilliard, Jeffrey P. Bigham, Jeffrey Nichols

Abstract: Many apps have basic accessibility issues, like missing labels or low contrast. Automated tools can help app developers catch basic issues, but can be laborious or require writing dedicated tests. We propose a system, motivated by a collaborative process with accessibility stakeholders at a large technology company, to generate whole app accessibility reports by combining varied data collection me… ▽ More Many apps have basic accessibility issues, like missing labels or low contrast. Automated tools can help app developers catch basic issues, but can be laborious or require writing dedicated tests. We propose a system, motivated by a collaborative process with accessibility stakeholders at a large technology company, to generate whole app accessibility reports by combining varied data collection methods (e.g., app crawling, manual recording) with an existing accessibility scanner. Many such scanners are based on single-screen scanning, and a key problem in whole app accessibility reporting is to effectively de-duplicate and summarize issues collected across an app. To this end, we developed a screen grou** model with 96.9% accuracy (88.8% F1-score) and UI element matching heuristics with 97% accuracy (98.2% F1-score). We combine these technologies in a system to report and summarize unique issues across an app, and enable a unique pixel-based ignore feature to help engineers and testers better manage reported issues across their app's lifetime. We conducted a qualitative evaluation with 18 accessibility-focused engineers and testers which showed this system can enhance their existing accessibility testing toolkit and address key limitations in current accessibility scanning tools. △ Less

Submitted 16 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

Comments: 24 pages, 8 figures

arXiv:1905.03702 [pdf, other]

OpenEDS: Open Eye Dataset

Authors: Stephan J. Garbin, Yiru Shen, Immo Schuetz, Robert Cavin, Gregory Hughes, Sachin S. Talathi

Abstract: We present a large scale data set, OpenEDS: Open Eye Dataset, of eye-images captured using a virtual-reality (VR) head mounted display mounted with two synchronized eyefacing cameras at a frame rate of 200 Hz under controlled illumination. This dataset is compiled from video capture of the eye-region collected from 152 individual participants and is divided into four subsets: (i) 12,759 images wit… ▽ More We present a large scale data set, OpenEDS: Open Eye Dataset, of eye-images captured using a virtual-reality (VR) head mounted display mounted with two synchronized eyefacing cameras at a frame rate of 200 Hz under controlled illumination. This dataset is compiled from video capture of the eye-region collected from 152 individual participants and is divided into four subsets: (i) 12,759 images with pixel-level annotations for key eye-regions: iris, pupil and sclera (ii) 252,690 unlabelled eye-images, (iii) 91,200 frames from randomly selected video sequence of 1.5 seconds in duration and (iv) 143 pairs of left and right point cloud data compiled from corneal topography of eye regions collected from a subset, 143 out of 152, participants in the study. A baseline experiment has been evaluated on OpenEDS for the task of semantic segmentation of pupil, iris, sclera and background, with the mean intersectionover-union (mIoU) of 98.3 %. We anticipate that OpenEDS will create opportunities to researchers in the eye tracking community and the broader machine learning and computer vision community to advance the state of eye-tracking for VR applications. The dataset is available for download upon request at https://research.fb.com/programs/openeds-challenge △ Less

Submitted 17 May, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

Comments: 11 pages; 12 figures

arXiv:1603.08978 [pdf, ps, other]

doi 10.1145/2940157.2940162

RemIX: A Distributed Internet Exchange for Remote and Rural Networks

Authors: William Waites, James Sweet, Roger Baig, Peter Buneman, Marwan Fayed, Gordon Hughes, Michael Fourman, Richard Simmons

Abstract: The concept of the IXP, an Ethernet fabric central to the structure of the global Internet, is largely absent from the development of community-driven collaborative network infrastructure. The reasons for this are two-fold. IXPs exist in central, typically urban, environments where strong network infrastructure ensures high levels of connectivity. Between rural and remote regions, where networks a… ▽ More The concept of the IXP, an Ethernet fabric central to the structure of the global Internet, is largely absent from the development of community-driven collaborative network infrastructure. The reasons for this are two-fold. IXPs exist in central, typically urban, environments where strong network infrastructure ensures high levels of connectivity. Between rural and remote regions, where networks are separated by distance and terrain, no such infrastructure exists. In this paper we present RemIX a distributed IXPs architecture designed for the community network environment. We examine this praxis using an implementation in Scotland, with suggestions for future development and research. △ Less

Submitted 29 March, 2016; originally announced March 2016.

arXiv:1303.5416 [pdf]

Representing Heuristic Knowledge in D-S Theory

Authors: Weiru Liu, John G. Hughes, Michael F. McTear

Abstract: The Dempster-Shafer theory of evidence has been used intensively to deal with uncertainty in knowledge-based systems. However the representation of uncertain relationships between evidence and hypothesis groups (heuristic knowledge) is still a major research problem. This paper presents an approach to representing such heuristic knowledge by evidential map**s which are defined on the basis of… ▽ More The Dempster-Shafer theory of evidence has been used intensively to deal with uncertainty in knowledge-based systems. However the representation of uncertain relationships between evidence and hypothesis groups (heuristic knowledge) is still a major research problem. This paper presents an approach to representing such heuristic knowledge by evidential map**s which are defined on the basis of mass functions. The relationships between evidential map**s and multi valued map**s, as well as between evidential map**s and Bayesian multi- valued causal link models in Bayesian theory are discussed. Following this the detailed procedures for constructing evidential map**s for any set of heuristic rules are introduced. Several situations of belief propagation are discussed. △ Less

Submitted 13 March, 2013; originally announced March 2013.

Comments: Appears in Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI1992)

Report number: UAI-P-1992-PG-182-190

arXiv:1303.5391 [pdf]

RES - a Relative Method for Evidential Reasoning

Authors: Zhi An, David A. Bell, John G. Hughes

Abstract: In this paper we describe a novel method for evidential reasoning [1]. It involves modelling the process of evidential reasoning in three steps, namely, evidence structure construction, evidence accumulation, and decision making. The proposed method, called RES, is novel in that evidence strength is associated with an evidential support relationship (an argument) between a pair of statements and… ▽ More In this paper we describe a novel method for evidential reasoning [1]. It involves modelling the process of evidential reasoning in three steps, namely, evidence structure construction, evidence accumulation, and decision making. The proposed method, called RES, is novel in that evidence strength is associated with an evidential support relationship (an argument) between a pair of statements and such strength is carried by comparison between arguments. This is in contrast to the onventional approaches, where evidence strength is represented numerically and is associated with a statement. △ Less

Submitted 13 March, 2013; originally announced March 2013.

Comments: Appears in Proceedings of the Eighth Conference on Uncertainty in Artificial Intelligence (UAI1992)

Report number: UAI-P-1992-PG-1-8

arXiv:1302.1523 [pdf]

Corporate Evidential Decision Making in Performance Prediction Domains

Authors: Alex G. Buchner, Werner Dubitzky, Alfons Schuster, Philippe Lopes, Peter G. O'Donoghue, John G. Hughes, David A. Bell, Kenny Adamson, John A. White, John M. C. C. Anderson, Maurice D. Mulvenna

Abstract: Performance prediction or forecasting sporting outcomes involves a great deal of insight into the particular area one is dealing with, and a considerable amount of intuition about the factors that bear on such outcomes and performances. The mathematical Theory of Evidence offers representation formalisms which grant experts a high degree of freedom when expressing their subjective beliefs in the… ▽ More Performance prediction or forecasting sporting outcomes involves a great deal of insight into the particular area one is dealing with, and a considerable amount of intuition about the factors that bear on such outcomes and performances. The mathematical Theory of Evidence offers representation formalisms which grant experts a high degree of freedom when expressing their subjective beliefs in the context of decision-making situations like performance prediction. Furthermore, this reasoning framework incorporates a powerful mechanism to systematically pool the decisions made by individual subject matter experts. The idea behind such a combination of knowledge is to improve the competence (quality) of the overall decision-making process. This paper reports on a performance prediction experiment carried out during the European Football Championship in 1996. Relying on the knowledge of four predictors, Evidence Theory was used to forecast the final scores of all 31 matches. The results of this empirical study are very encouraging. △ Less

Submitted 6 February, 2013; originally announced February 2013.

Comments: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997)

Report number: UAI-P-1997-PG-38-45

arXiv:1210.5975 [pdf, other]

Solid State Disk Object-Based Storage with Trim Commands

Authors: Tasha Frankie, Gordon Hughes, Ken Kreutz-Delgado

Abstract: This paper presents a model of NAND flash SSD utilization and write amplification when the ATA/ATAPI SSD Trim command is incorporated into object-based storage under a variety of user workloads, including a uniform random workload with objects of fixed size and a uniform random workload with objects of varying sizes. We first summarize the existing models for write amplification in SSDs for worklo… ▽ More This paper presents a model of NAND flash SSD utilization and write amplification when the ATA/ATAPI SSD Trim command is incorporated into object-based storage under a variety of user workloads, including a uniform random workload with objects of fixed size and a uniform random workload with objects of varying sizes. We first summarize the existing models for write amplification in SSDs for workloads with and without the Trim command, then propose an alteration of the models that utilizes a framework of object-based storage. The utilization of objects and pages in the SSD is derived, with the analytic results compared to simulation. Finally, the effect of objects on write amplification and its computation is discussed along with a potential application to optimization of SSD usage through object storage metadata servers that allocate object classes of distinct object size. △ Less

Submitted 10 October, 2012; originally announced October 2012.

arXiv:1208.1794 [pdf, ps, other]

Analysis of Trim Commands on Overprovisioning and Write Amplification in Solid State Drives

Authors: Tasha Frankie, Gordon Hughes, Ken Kreutz-Delgado

Abstract: This paper presents a performance model of the ATA/ATAPI SSD Trim command under various types of user workloads, including a uniform random workload, a workload with hot and cold data, and a workload with N temperatures of data. We first examine the Trim-modified uniform random workload to predict utilization, then use this result to compute the resultant level of effective overprovisioning. This… ▽ More This paper presents a performance model of the ATA/ATAPI SSD Trim command under various types of user workloads, including a uniform random workload, a workload with hot and cold data, and a workload with N temperatures of data. We first examine the Trim-modified uniform random workload to predict utilization, then use this result to compute the resultant level of effective overprovisioning. This allows modification of models previously suggested to predict write amplification of a non-Trim uniform random workload under greedy garbage collection. Finally, we expand the theory to cover a workload consisting of hot and cold data (and also N temperatures of data), providing formulas to predict write amplification in these scenarios. △ Less

Submitted 8 August, 2012; originally announced August 2012.

arXiv:1106.3787 [pdf, ps, other]

Calculating ellipse overlap areas

Authors: Gary B. Hughes, Mohcine Chraibi

Abstract: We present a general algorithm for finding the overlap area between two ellipses. The algorithm is based on finding a segment area (the area between an ellipse and a secant line) given two points on the ellipse. The Gauss-Green formula is used to determine the ellipse sector area between two points, and a triangular area is added or subtracted to give the segment area. For two ellipses, overlap ar… ▽ More We present a general algorithm for finding the overlap area between two ellipses. The algorithm is based on finding a segment area (the area between an ellipse and a secant line) given two points on the ellipse. The Gauss-Green formula is used to determine the ellipse sector area between two points, and a triangular area is added or subtracted to give the segment area. For two ellipses, overlap area is calculated by adding the areas of appropriate sectors and polygons. Intersection points for two general ellipses are found using Ferrari's quartic formula to solve the polynomial that results from combining the two ellipse equations. All cases for the number of intersection points (0, 1, 2, 3, 4) are handled. The algorithm is implemented in c-code, and has been tested with a range of input ellipses. The code is efficient enough for use in simulations that require many overlap area calculations. △ Less

Submitted 19 June, 2011; originally announced June 2011.

Comments: 85 pages, 10 figures, code in c

Showing 1–10 of 10 results for author: Hughes, G