Search | arXiv e-print repository

AI Cards: Towards an Applied Framework for Machine-Readable AI and Risk Documentation Inspired by the EU AI Act

Authors: Delaram Golpayegani, Isabelle Hupont, Cecilia Panigutti, Harshvardhan J. Pandit, Sven Schade, Declan O'Sullivan, Dave Lewis

Abstract: With the upcoming enforcement of the EU AI Act, documentation of high-risk AI systems and their risk management information will become a legal requirement playing a pivotal role in demonstration of compliance. Despite its importance, there is a lack of standards and guidelines to assist with drawing up AI and risk documentation aligned with the AI Act. This paper aims to address this gap by provi… ▽ More With the upcoming enforcement of the EU AI Act, documentation of high-risk AI systems and their risk management information will become a legal requirement playing a pivotal role in demonstration of compliance. Despite its importance, there is a lack of standards and guidelines to assist with drawing up AI and risk documentation aligned with the AI Act. This paper aims to address this gap by providing an in-depth analysis of the AI Act's provisions regarding technical documentation, wherein we particularly focus on AI risk management. On the basis of this analysis, we propose AI Cards as a novel holistic framework for representing a given intended use of an AI system by encompassing information regarding technical specifications, context of use, and risk management, both in human- and machine-readable formats. While the human-readable representation of AI Cards provides AI stakeholders with a transparent and comprehensible overview of the AI use case, its machine-readable specification leverages on state of the art Semantic Web technologies to embody the interoperability needed for exchanging documentation within the AI value chain. This brings the flexibility required for reflecting changes applied to the AI system and its context, provides the scalability needed to accommodate potential amendments to legal requirements, and enables development of automated tools to assist with legal compliance and conformity assessment tasks. To solidify the benefits, we provide an exemplar AI Card for an AI-based student proctoring system and further discuss its potential applications within and beyond the context of the AI Act. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2202.11827 [pdf, other]

doi 10.1145/3477495.3531663

TARexp: A Python Framework for Technology-Assisted Review Experiments

Authors: Eugene Yang, David D. Lewis

Abstract: Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework… ▽ More Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework for conducting experiments on TAR algorithms. Key characteristics of this framework are declarative representations of workflows and experiment plans, the ability for components to play variable numbers of workflow roles, and state maintenance and restart capabilities. Users can draw on reference implementations of standard TAR algorithms while incorporating novel components to explore their research interests. The framework is available at https://github.com/eugene-yang/tarexp. △ Less

Submitted 24 April, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: 6 pages, 4 figures, accepted as a SIGIR 2022 demo paper

arXiv:2202.11149 [pdf, other]

Incorporating social norms into a configurable agent-based model of the decision to perform commuting behaviour

Authors: Robert Greener, Daniel Lewis, Jon Reades, Simon Miles, Steven Cummins

Abstract: Interventions to increase active commuting have been recommended as a method to increase population physical activity, but evidence is mixed. Social norms related to travel behaviour may influence the uptake of active commuting interventions but are rarely considered in their design and evaluation. In this study we develop an agent-based model that incorporates social norms related to travel behav… ▽ More Interventions to increase active commuting have been recommended as a method to increase population physical activity, but evidence is mixed. Social norms related to travel behaviour may influence the uptake of active commuting interventions but are rarely considered in their design and evaluation. In this study we develop an agent-based model that incorporates social norms related to travel behaviour and demonstrate the utility of this through implementing car-free Wednesdays. A synthetic population of Waltham Forest, London, UK was generated using a microsimulation approach with data from the UK Census 2011 and UK HLS datasets. An agent-based model was created using this synthetic population which modelled how the actions of peers and neighbours, subculture, habit, weather, bicycle ownership, car ownership, environmental supportiveness, and congestion affect the decision to trave. The developed model (MOTIVATE) is a configurable agent-based model where social norms related to travel behaviour are used to provide a more realistic representation of the socio-ecological systems in which active commuting interventions may be deployed. The utility of this model is demonstrated using car-free days as a hypothetical intervention. In the control scenario, the odds of active travel were plausible at 0.091 (89% HPDI: [0.091, 0.091]). Compared to the control scenario, the odds of active travel were increased by 70.3% (89% HPDI: [70.3%, 70.3%]), in the intervention scenario, on non-car-free days; the effect is sustained to non-car-free days. The model is a useful tool for investigating the effect of how social networks and social norms influence the effectiveness of various interventions. If configured using real-world built environment data, it may be useful for investigating how social norms interact with the built environment to cause the emergence of commuting conventions. △ Less

Submitted 10 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: 18 pages, 2 figures, 4 tables. Published version in ATT'22 Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria, http://ceur-ws.org/Vol-3173/12.pdf

ACM Class: J.3; J.4

arXiv:2108.12752 [pdf, other]

TAR on Social Media: A Framework for Online Content Moderation

Authors: Eugene Yang, David D. Lewis, Ophir Frieder

Abstract: Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from… ▽ More Content moderation (removing or limiting the distribution of posts based on their contents) is one tool social networks use to fight problems such as harassment and disinformation. Manually screening all content is usually impractical given the scale of social media data, and the need for nuanced human interpretations makes fully automated approaches infeasible. We consider content moderation from the perspective of technology-assisted review (TAR): a human-in-the-loop active learning approach developed for high recall retrieval problems in civil litigation and other fields. We show how TAR workflows, and a TAR cost model, can be adapted to the content moderation problem. We then demonstrate on two publicly available content moderation data sets that a TAR workflow can reduce moderation costs by 20% to 55% across a variety of conditions. △ Less

Submitted 29 August, 2021; originally announced August 2021.

Comments: 9 pages, 2 figures, accepted at DESIRES 2021

arXiv:2108.12746 [pdf, other]

doi 10.1145/3459637.3482415

Certifying One-Phase Technology-Assisted Reviews

Authors: David D. Lewis, Eugene Yang, Ophir Frieder

Abstract: Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stop** rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stop** ru… ▽ More Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stop** rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stop** rules for one-phase TAR. We further show theoretically and empirically that overshooting a recall target, which has been treated as innocuous or desirable in past evaluations of stop** rules, is a major source of excess cost in one-phase TAR workflows. Counterintuitively, incurring a larger sampling cost to reduce excess recall leads to lower total cost in almost all scenarios. △ Less

Submitted 29 August, 2021; originally announced August 2021.

Comments: 10 pages, 4 figures, accepted at CIKM 2021

arXiv:2108.09959 [pdf]

Artificial Intelligence Ethics: An Inclusive Global Discourse?

Authors: Cathy Roche, Dave Lewis, P. J. Wall

Abstract: It is widely accepted that technology is ubiquitous across the planet and has the potential to solve many of the problems existing in the Global South. Moreover, the rapid advancement of artificial intelligence (AI) brings with it the potential to address many of the challenges outlined in the Sustainable Development Goals (SDGs) in ways which were never before possible. However, there are many qu… ▽ More It is widely accepted that technology is ubiquitous across the planet and has the potential to solve many of the problems existing in the Global South. Moreover, the rapid advancement of artificial intelligence (AI) brings with it the potential to address many of the challenges outlined in the Sustainable Development Goals (SDGs) in ways which were never before possible. However, there are many questions about how such advanced technologies should be managed and governed, and whether or not the emerging ethical frameworks and standards for AI are dominated by the Global North. This research examines the growing body of documentation on AI ethics to examine whether or not there is equality of participation in the ongoing global discourse. Specifically, it seeks to discover if both countries in the Global South and women are underrepresented in this discourse. Findings indicate a dearth of references to both of these themes in the AI ethics documents, suggesting that the associated ethical implications and risks are being neglected. Without adequate input from both countries in the Global South and from women, such ethical frameworks and standards may be discriminatory with the potential to reinforce marginalisation. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: In proceedings of the 1st Virtual Conference on Implications of Information and Digital Technologies for Development, 2021

arXiv:2106.09871 [pdf, other]

doi 10.1145/3469096.3469873

Heuristic Stop** Rules For Technology-Assisted Review

Authors: Eugene Yang, David D. Lewis, Ophir Frieder

Abstract: Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stop** rules have been suggested for striking this tradeoff in particular settings, but none have been… ▽ More Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stop** rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stop** rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 10 pages, 2 figures. Accepted at DocEng 21

arXiv:2106.09866 [pdf, other]

doi 10.1145/3469096.3469872

On Minimizing Cost in Legal Document Review Workflows

Authors: Eugene Yang, David D. Lewis, Ophir Frieder

Abstract: Technology-assisted review (TAR) refers to human-in-the-loop machine learning workflows for document review in legal discovery and other high recall review tasks. Attorneys and legal technologists have debated whether review should be a single iterative process (one-phase TAR workflows) or whether model training and review should be separate (two-phase TAR workflows), with implications for the cho… ▽ More Technology-assisted review (TAR) refers to human-in-the-loop machine learning workflows for document review in legal discovery and other high recall review tasks. Attorneys and legal technologists have debated whether review should be a single iterative process (one-phase TAR workflows) or whether model training and review should be separate (two-phase TAR workflows), with implications for the choice of active learning algorithm. The relative cost of manual labeling for different purposes (training vs. review) and of different documents (positive vs. negative examples) is a key and neglected factor in this debate. Using a novel cost dynamics analysis, we show analytically and empirically that these relative costs strongly impact whether a one-phase or two-phase workflow minimizes cost. We also show how category prevalence, classification task difficulty, and collection size impact the optimal choice not only of workflow type, but of active learning method and stop** point. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 10 pages, 3 figures. Accepted at DocEng 21

arXiv:2105.01044 [pdf, other]

Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

Authors: Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

Abstract: Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in… ▽ More Technology-assisted review (TAR) refers to iterative active learning workflows for document review in high recall retrieval (HRR) tasks. TAR research and most commercial TAR software have applied linear models such as logistic regression to lexical features. Transformer-based models with supervised tuning are known to improve effectiveness on many text classification tasks, suggesting their use in TAR. We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection. In contrast, we likewise determined that linear models outperform BERT for simulated legal discovery topics on the Jeb Bush e-mail collection. This suggests the match between transformer pre-training corpora and the task domain is of greater significance than generally appreciated. Additionally, we show that just-right language model fine-tuning on the task collection before starting active learning is critical. Too little or too much fine-tuning hinders performance, worse than that of linear models, even for a favorable corpus such as RCV1-v2. △ Less

Submitted 19 January, 2022; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: 6 pages, 1 figure, accepted at ECIR 2022

arXiv:2007.13417 [pdf, other]

doi 10.1063/5.0013720

Image-driven discriminative and generative machine learning algorithms for establishing microstructure-processing relationships

Authors: Wufei Ma, Elizabeth Kautz, Arun Baskaran, Aritra Chowdhury, Vineet Joshi, Bülent Yener, Daniel Lewis

Abstract: We investigate methods of microstructure representation for the purpose of predicting processing condition from microstructure image data. A binary alloy (uranium-molybdenum) that is currently under development as a nuclear fuel was studied for the purpose of develo** an improved machine learning approach to image recognition, characterization, and building predictive capabilities linking micros… ▽ More We investigate methods of microstructure representation for the purpose of predicting processing condition from microstructure image data. A binary alloy (uranium-molybdenum) that is currently under development as a nuclear fuel was studied for the purpose of develo** an improved machine learning approach to image recognition, characterization, and building predictive capabilities linking microstructure to processing conditions. Here, we test different microstructure representations and evaluate model performance based on the F1 score. A F1 score of 95.1% was achieved for distinguishing between micrographs corresponding to ten different thermo-mechanical material processing conditions. We find that our newly developed microstructure representation describes image data well, and the traditional approach of utilizing area fractions of different phases is insufficient for distinguishing between multiple classes using a relatively small, imbalanced original data set of 272 images. To explore the applicability of generative methods for supplementing such limited data sets, generative adversarial networks were trained to generate artificial microstructure images. Two different generative networks were trained and tested to assess performance. Challenges and best practices associated with applying machine learning to limited microstructure image data sets is also discussed. Our work has implications for quantitative microstructure analysis, and development of microstructure-processing relationships in limited data sets typical of metallurgical process design studies. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: 14 pages, 15 figures

arXiv:2005.00986 [pdf]

Using Artificial Intelligence to Analyze Fashion Trends

Authors: Mengyun Shi, Van Dyk Lewis

Abstract: Analyzing fashion trends is essential in the fashion industry. Current fashion forecasting firms, such as WGSN, utilize the visual information from around the world to analyze and predict fashion trends. However, analyzing fashion trends is time-consuming and extremely labor intensive, requiring individual employees' manual editing and classification. To improve the efficiency of data analysis of… ▽ More Analyzing fashion trends is essential in the fashion industry. Current fashion forecasting firms, such as WGSN, utilize the visual information from around the world to analyze and predict fashion trends. However, analyzing fashion trends is time-consuming and extremely labor intensive, requiring individual employees' manual editing and classification. To improve the efficiency of data analysis of such image-based information and lower the cost of analyzing fashion images, this study proposes a data-driven quantitative abstracting approach using an artificial intelligence (A.I.) algorithm. Specifically, an A.I. model was trained on fashion images from a large-scale dataset under different scenarios, for example in online stores and street snapshots. This model was used to detect garments and classify clothing attributes such as textures, garment style, and details for runway photos and videos. It was found that the A.I. model can generate rich attribute descriptions of detected regions and accurately bind the garments in the images. Adoption of A.I. algorithm demonstrated promising results and the potential to classify garment types and details automatically, which can make the process of trend forecasting more cost-effective and faster. △ Less

Submitted 3 May, 2020; originally announced May 2020.

arXiv:2001.06367 [pdf, other]

On Covering Numbers, Young Diagrams, and the Local Dimension of Posets

Authors: Gábor Damásdi, Stefan Felsner, António Girão, Balázs Keszegh, David Lewis, Dániel T. Nagy, Torsten Ueckerdt

Abstract: We study covering numbers and local covering numbers with respect to difference graphs and complete bipartite graphs. In particular we show that in every cover of a Young diagram with $\binom{2k}{k}$ steps with generalized rectangles there is a row or a column in the diagram that is used by at least $k+1$ rectangles, and prove that this is best-possible. This answers two questions by Kim, Martin,… ▽ More We study covering numbers and local covering numbers with respect to difference graphs and complete bipartite graphs. In particular we show that in every cover of a Young diagram with $\binom{2k}{k}$ steps with generalized rectangles there is a row or a column in the diagram that is used by at least $k+1$ rectangles, and prove that this is best-possible. This answers two questions by Kim, Martin, Masa{ř}{\'ı}k, Shull, Smith, Uzzell, and Wang (Europ. J. Comb. 2020), namely: - What is the local complete bipartite cover number of a difference graph? - Is there a sequence of graphs with constant local difference graph cover number and unbounded local complete bipartite cover number? We add to the study of these local covering numbers with a lower bound construction and some examples. Following Kim \emph{et al.}, we use the results on local covering numbers to provide lower and upper bounds for the local dimension of partially ordered sets of height~2. We discuss the local dimension of some posets related to Boolean lattices and show that the poset induced by the first two layers of the Boolean lattice has local dimension $(1 + o(1))\log_2\log_2 n$. We conclude with some remarks on covering numbers for digraphs and Ferrers dimension. △ Less

Submitted 17 January, 2020; originally announced January 2020.

Comments: Parts of this paper have previously been reported in arXiv submission arXiv:1902.08223

arXiv:1909.05660 [pdf, other]

Predicting intelligence based on cortical WM/GM contrast, cortical thickness and volumetry

Authors: Juan Miguel Valverde, Vandad Imani, John D. Lewis, Jussi Tohka

Abstract: We propose a four-layer fully-connected neural network (FNN) for predicting fluid intelligence scores from T1-weighted MR images for the ABCD-challenge. In addition to the volumes of brain structures, the FNN uses cortical WM/GM contrast and cortical thickness at 78 cortical regions. These last two measurements were derived from the T1-weighted MR images using cortical surfaces produced by the CIV… ▽ More We propose a four-layer fully-connected neural network (FNN) for predicting fluid intelligence scores from T1-weighted MR images for the ABCD-challenge. In addition to the volumes of brain structures, the FNN uses cortical WM/GM contrast and cortical thickness at 78 cortical regions. These last two measurements were derived from the T1-weighted MR images using cortical surfaces produced by the CIVET pipeline. The age and gender of the subjects and the scanner manufacturer are also used as features for the learning algorithm. This yielded 283 features provided to the FNN with two hidden layers of 20 and 15 nodes. The method was applied to the data from the ABCD study. Trained with a training set of 3736 subjects, the proposed method achieved a MSE of 71.596 and a correlation of 0.151 in the validation set of 415 subjects. For the final submission, the model was trained with 3568 subjects and it achieved a MSE of 94.0270 in the test set comprised of 4383 subjects. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: Submission to the ABCD Neurocognitive Prediction Challenge at MICCAI 2019

arXiv:1906.05496 [pdf, other]

An image-driven machine learning approach to kinetic modeling of a discontinuous precipitation reaction

Authors: Elizabeth Kautz, Wufei Ma, Saumyadeep Jana, Arun Devaraj, Vineet Joshi, Bülent Yener, Daniel Lewis

Abstract: Micrograph quantification is an essential component of several materials science studies. Machine learning methods, in particular convolutional neural networks, have previously demonstrated performance in image recognition tasks across several disciplines (e.g. materials science, medical imaging, facial recognition). Here, we apply these well-established methods to develop an approach to microstru… ▽ More Micrograph quantification is an essential component of several materials science studies. Machine learning methods, in particular convolutional neural networks, have previously demonstrated performance in image recognition tasks across several disciplines (e.g. materials science, medical imaging, facial recognition). Here, we apply these well-established methods to develop an approach to microstructure quantification for kinetic modeling of a discontinuous precipitation reaction in a case study on the uranium-molybdenum system. Prediction of material processing history based on image data (classification), calculation of area fraction of phases present in the micrographs (segmentation), and kinetic modeling from segmentation results were performed. Results indicate that convolutional neural networks represent microstructure image data well, and segmentation using the k-means clustering algorithm yields results that agree well with manually annotated images. Classification accuracies of original and segmented images are both 94\% for a 5-class classification problem. Kinetic modeling results agree well with previously reported data using manual thresholding. The image quantification and kinetic modeling approach developed and presented here aims to reduce researcher bias introduced into the characterization process, and allows for leveraging information in limited image data sets. △ Less

Submitted 13 June, 2019; originally announced June 2019.

Comments: 30 pages, 8 figures

arXiv:1609.04667 [pdf]

War-Algorithm Accountability

Authors: Dustin A. Lewis, Gabriella Blum, Naz K. Modirzadeh

Abstract: In this briefing report, we introduce a new concept (war algorithms) that elevates algorithmically-derived choices and decisions to a, and perhaps the, central concern regarding technical autonomy in war. We thereby aim to shed light on and recast the discussion regarding autonomous weapon systems. We define war algorithm as any algorithm that is expressed in computer code, that is effectuated thr… ▽ More In this briefing report, we introduce a new concept (war algorithms) that elevates algorithmically-derived choices and decisions to a, and perhaps the, central concern regarding technical autonomy in war. We thereby aim to shed light on and recast the discussion regarding autonomous weapon systems. We define war algorithm as any algorithm that is expressed in computer code, that is effectuated through a constructed system, and that is capable of operating in relation to armed conflict. In introducing this concept, our foundational technological concern is the capability of a constructed system, without further human intervention, to help make and effectuate a decision or choice of a war algorithm. Distilled, the two core ingredients are an algorithm expressed in computer code and a suitably capable constructed system. Through that lens, we link international law and related accountability architectures to relevant technologies. We sketch a three-part (non-exhaustive) approach that highlights traditional and unconventional accountability avenues. We focus largely on international law because it is the only normative regime that purports, in key respects but with important caveats, to be both universal and uniform. By not limiting our inquiry only to weapon systems, we take an expansive view, showing how the broad concept of war algorithms might be susceptible to regulation, and how those algorithms might already fit within the existing regulatory system established by international law. △ Less

Submitted 12 September, 2016; originally announced September 2016.

arXiv:1605.04475 [pdf, other]

doi 10.1007/s10579-014-9273-4

Capturing divergence in dependency trees to improve syntactic projection

Authors: Ryan Georgi, Fei Xia, William D. Lewis

Abstract: Obtaining syntactic parses is a crucial part of many NLP pipelines. However, most of the world's languages do not have large amounts of syntactically annotated corpora available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora consisting of resource-poor and resource-rich language pairs, taking advantage of a parser for the resource-rich… ▽ More Obtaining syntactic parses is a crucial part of many NLP pipelines. However, most of the world's languages do not have large amounts of syntactically annotated corpora available for building parsers. Syntactic projection techniques attempt to address this issue by using parallel corpora consisting of resource-poor and resource-rich language pairs, taking advantage of a parser for the resource-rich language and word alignment between the languages to project the parses onto the data for the resource-poor language. These projection methods can suffer, however, when the two languages are divergent. In this paper, we investigate the possibility of using small, parallel, annotated corpora to automatically detect divergent structural patterns between two languages. These patterns can then be used to improve structural projection algorithms, allowing for better performing NLP tools for resource-poor languages, in particular those that may not have large amounts of annotated data necessary for traditional, fully-supervised methods. While this detection process is not exhaustive, we demonstrate that common patterns of divergence can be identified automatically without prior knowledge of a given language pair, and the patterns can be used to improve performance of projection algorithms. △ Less

Submitted 14 May, 2016; originally announced May 2016.

arXiv:1312.1378 [pdf, other]

An Analytical Model for Loc/ID Map**s Caches

Authors: Florin Coras, Jordi Domingo-Pascual, Darrel Lewis, Albert Cabellos-Aparicio

Abstract: Concerns regarding the scalability of the inter-domain routing have encouraged researchers to start elaborating a more robust Internet architecture. While consensus on the exact form of the solution is yet to be found, the need for a semantic decoupling of a node's location and identity is generally accepted as a promising way forward. However, this typically requires the use of caches that store… ▽ More Concerns regarding the scalability of the inter-domain routing have encouraged researchers to start elaborating a more robust Internet architecture. While consensus on the exact form of the solution is yet to be found, the need for a semantic decoupling of a node's location and identity is generally accepted as a promising way forward. However, this typically requires the use of caches that store temporal bindings between the two namespaces, to avoid hampering router packet forwarding speeds. In this article, we propose a methodology for an analytical analysis of cache performance that relies on the working-set theory. We first identify the conditions that network traffic must comply with for the theory to be applicable and then develop a model that predicts average cache miss rates relying on easily measurable traffic parameters. We validate the result by emulation, using real packet traces collected at the egress points of a campus and an academic network. To prove its versatility, we extend the model to consider cache polluting user traffic and observe that simple, low intensity attacks drastically reduce performance, whereby manufacturers should either overprovision router memory or implement more complex cache eviction policies. △ Less

Submitted 6 December, 2013; v1 submitted 4 December, 2013; originally announced December 2013.

arXiv:0909.2368 [pdf]

Web Single Sign-On Authentication using SAML

Authors: Kelly D. Lewis andjames E. Lewis

Abstract: Companies have increasingly turned to application service providers (ASPs) or Software as a Service (SaaS) vendors to offer specialized web-based services that will cut costs and provide specific and focused applications to users. The complexity of designing, installing, configuring, deploying, and supporting the system with internal resources can be eliminated with this type of methodology, pro… ▽ More Companies have increasingly turned to application service providers (ASPs) or Software as a Service (SaaS) vendors to offer specialized web-based services that will cut costs and provide specific and focused applications to users. The complexity of designing, installing, configuring, deploying, and supporting the system with internal resources can be eliminated with this type of methodology, providing great benefit to organizations. However, these models can present an authentication problem for corporations with a large number of external service providers. This paper describes the implementation of Security Assertion Markup Language (SAML) and its capabilities to provide secure single sign-on (SSO) solutions for externally hosted applications. △ Less

Submitted 12 September, 2009; originally announced September 2009.

Comments: International Journal of Computer Science Issues (IJCSI), Volume 1, pp41-48, August 2009

Journal ref: K. D. LEWIS andJ. E. LEWIS, " Web Single Sign-On Authentication using SAML", International Journal of Computer Science Issues (IJCSI), Volume 1, pp41-48, August 2009

arXiv:math/0204068 [pdf, ps, other]

Computational problems for vector-valued quadratic forms

Authors: Francesco Bullo, Jorge Cortes, Andrew D. Lewis, Sonia Martinez

Abstract: Given two real vector spaces $U$ and $V$, and a symmetric bilinear map $B: U\times U\to V$, let $Q_B$ be its associated quadratic map $Q_B$. The problems we consider are as follows: (i) are there necessary and sufficient conditions, checkable in polynomial-time, for determining when $Q_B$ is surjective?; (ii) if $Q_B$ is surjective, given $v\in V$ is there a polynomial-time algorithm for finding… ▽ More Given two real vector spaces $U$ and $V$, and a symmetric bilinear map $B: U\times U\to V$, let $Q_B$ be its associated quadratic map $Q_B$. The problems we consider are as follows: (i) are there necessary and sufficient conditions, checkable in polynomial-time, for determining when $Q_B$ is surjective?; (ii) if $Q_B$ is surjective, given $v\in V$ is there a polynomial-time algorithm for finding a point $u\in Q_B^{-1}(v)$?; (iii) are there necessary and sufficient conditions, checkable in polynomial-time, for determining when $B$ is indefinite? We present an alternative formulation of the problem of determining the image of a vector-valued quadratic form in terms of the unprojectivised Veronese surface. The relation of these questions with several interesting problems in Control Theory is illustrated. △ Less

Submitted 5 April, 2002; originally announced April 2002.

Comments: 6 pages, no figures, submitted to Workshop on Open Problems in Mathematical Systems and Control Theory

MSC Class: 11Exx; 14Pxx; 14Q99; 15A63

arXiv:cmp-lg/9407020 [pdf, ps]

A Sequential Algorithm for Training Text Classifiers

Authors: David D. Lewis, William A. Gale

Abstract: The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertain… ▽ More The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness. △ Less

Submitted 24 July, 1994; v1 submitted 24 July, 1994; originally announced July 1994.

Comments: 10 pages, uuencoded, compressed PostScript; Proc. SIGIR-94 LaTex available from [email protected]

Showing 1–20 of 20 results for author: Lewis, D