Search | arXiv e-print repository

Adaptability of Neural Networks on Varying Granularity IR Tasks

Authors: Daniel Cohen, Qingyao Ai, W. Bruce Croft

Abstract: Recent work in Information Retrieval (IR) using Deep Learning models has yielded state of the art results on a variety of IR tasks. Deep neural networks (DNN) are capable of learning ideal representations of data during the training process, removing the need for independently extracting features. However, the structures of these DNNs are often tailored to perform on specific datasets. In addition… ▽ More Recent work in Information Retrieval (IR) using Deep Learning models has yielded state of the art results on a variety of IR tasks. Deep neural networks (DNN) are capable of learning ideal representations of data during the training process, removing the need for independently extracting features. However, the structures of these DNNs are often tailored to perform on specific datasets. In addition, IR tasks deal with text at varying levels of granularity from single factoids to documents containing thousands of words. In this paper, we examine the role of the granularity on the performance of common state of the art DNN structures in IR. △ Less

Submitted 24 June, 2016; originally announced June 2016.

Comments: 4 pages, Neu-IR'16 SIGIR Workshop on Neural Information Retrieval, July 21, 2016, Pisa, Italy

ACM Class: H.3.3; I.5.1

arXiv:1505.06939 [pdf, other]

A Novel Geographic Partitioning System for Anonymizing Health Care Data

Authors: William Lee Croft, Wei Shi, Jorg-Rudiger Sack, Jean-Pierre Corriveau

Abstract: With large volumes of detailed health care data being collected, there is a high demand for the release of this data for research purposes. Hospitals and organizations are faced with conflicting interests of releasing this data and protecting the confidentiality of the individuals to whom the data pertains. Similarly, there is a conflict in the need to release precise geographic information for ce… ▽ More With large volumes of detailed health care data being collected, there is a high demand for the release of this data for research purposes. Hospitals and organizations are faced with conflicting interests of releasing this data and protecting the confidentiality of the individuals to whom the data pertains. Similarly, there is a conflict in the need to release precise geographic information for certain research applications and the requirement to censor or generalize the same information for the sake of confidentiality. Ultimately the challenge is to anonymize data in order to comply with government privacy policies while reducing the loss in geographic information as much as possible. In this paper, we present a novel geographic-based system for the anonymization of health care data. This system is broken up into major components for which different approaches may be supplied. We compare such approaches in order to make recommendations on which of them to select to best match user requirements. △ Less

Submitted 26 May, 2015; originally announced May 2015.

arXiv:1505.06786 [pdf, other]

Geographic Partitioning Techniques for the Anonymization of Health Care Data

Authors: William Lee Croft, Wei Shi, Jorg-Rudiger Sack, Jean-Pierre Corriveau

Abstract: Hospitals and health care organizations collect large amounts of detailed health care data that is in high demand by researchers. Thus, the possessors of such data are in need of methods that allow for this data to be released without compromising the confidentiality of the individuals to whom it pertains. As the geographic aspect of this data is becoming increasingly relevant for research being c… ▽ More Hospitals and health care organizations collect large amounts of detailed health care data that is in high demand by researchers. Thus, the possessors of such data are in need of methods that allow for this data to be released without compromising the confidentiality of the individuals to whom it pertains. As the geographic aspect of this data is becoming increasingly relevant for research being conducted, it is important for an \emph{anonymization} process to pay due attention to the geographic attributes of such data. In this paper, a novel system for health care data anonymization is presented. At the core of the system is the aggregation of an initial regionalization guided by the use of a Voronoi diagram. We conduct a comparison with another geographic-based system of anonymization, GeoLeader. We show that our system is capable of producing results of a comparable quality with a much faster running time. △ Less

Submitted 25 May, 2015; originally announced May 2015.

arXiv:1504.07843 [pdf, other]

doi 10.1073/pnas.1520752113

On the universal structure of human lexical semantics

Authors: Hye** Youn, Logan Sutton, Eric Smith, Cristopher Moore, Jon F. Wilkins, Ian Maddieson, William Croft, Tanmoy Bhattacharya

Abstract: How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let… ▽ More How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition. Semantics, or meaning expressed through language, provides direct access to the underlying conceptual structure, but meaning is notoriously difficult to measure, let alone parameterize. Here we provide an empirical measure of semantic proximity between concepts using cross-linguistic dictionaries. Across languages carefully selected from a phylogenetically and geographically stratified sample of genera, translations of words reveal cases where a particular language uses a single polysemous word to express concepts represented by distinct words in another. We use the frequency of polysemies linking two concepts as a measure of their semantic proximity, and represent the pattern of such linkages by a weighted network. This network is highly uneven and fragmented: certain concepts are far more prone to polysemy than others, and there emerge naturally interpretable clusters loosely connected to each other. Statistical analysis shows such structural properties are consistent across different language groups, largely independent of geography, environment, and literacy. It is therefore possible to conclude the conceptual structure connecting basic vocabulary studied is primarily due to universal features of human cognition and language use. △ Less

Submitted 29 April, 2015; originally announced April 2015.

Comments: Press embargo in place until publication

Journal ref: PNAS 113 7 1766-1771 (2016)

arXiv:1311.7602 [pdf, other]

Parameter identification problems in the modelling of cell motility

Authors: Wayne Croft, Charles M Elliott, Graham Ladds, Björn Stinner, Chandrasekhar Venkataraman, Cathryn Weston

Abstract: We present a novel parameter identification algorithm for the estimation of parameters in models of cell motility using imaging data of migrating cells. Two alternative formulations of the objective functional that measures the difference between the computed and observed data are proposed and the parameter identification problem is formulated as a minimisation problem of nonlinear least squares t… ▽ More We present a novel parameter identification algorithm for the estimation of parameters in models of cell motility using imaging data of migrating cells. Two alternative formulations of the objective functional that measures the difference between the computed and observed data are proposed and the parameter identification problem is formulated as a minimisation problem of nonlinear least squares type. A Levenberg-Marquardt based optimisation method is applied to the solution of the minimisation problem and the details of the implementation are discussed. A number of numerical experiments are presented which illustrate the robustness of the algorithm to parameter identification in the presence of large deformations and noisy data and parameter identification in three dimensional models of cell motility. An application to experimental data is also presented in which we seek to identify parameters in a model for the monopolar growth of fission yeast cells using experimental imaging data. △ Less

Submitted 29 November, 2013; originally announced November 2013.

Comments: 31 Pages, 13 Figures

arXiv:cond-mat/0512588 [pdf, ps, other]

doi 10.1103/PhysRevE.73.046118

Utterance Selection Model of Language Change

Authors: G. J. Baxter, R. A. Blythe, W. Croft, A. J. McKane

Abstract: We present a mathematical formulation of a theory of language change. The theory is evolutionary in nature and has close analogies with theories of population genetics. The mathematical structure we construct similarly has correspondences with the Fisher-Wright model of population genetics, but there are significant differences. The continuous time formulation of the model is expressed in terms… ▽ More We present a mathematical formulation of a theory of language change. The theory is evolutionary in nature and has close analogies with theories of population genetics. The mathematical structure we construct similarly has correspondences with the Fisher-Wright model of population genetics, but there are significant differences. The continuous time formulation of the model is expressed in terms of a Fokker-Planck equation. This equation is exactly soluble in the case of a single speaker and can be investigated analytically in the case of multiple speakers who communicate equally with all other speakers and give their utterances equal weight. Whilst the stationary properties of this system have much in common with the single-speaker case, time-dependent properties are richer. In the particular case where linguistic forms can become extinct, we find that the presence of many speakers causes a two-stage relaxation, the first being a common marginal distribution that persists for a long time as a consequence of ultimate extinction being due to rare fluctuations. △ Less

Submitted 22 December, 2005; originally announced December 2005.

Comments: 21 pages, 17 figures

Journal ref: Phys Rev E (2006) 73, 046118

Showing 51–56 of 56 results for author: Croft, W