-
Numerical simulation of individual coil placement -- A proof-of-concept study for the prediction of recurrence after aneurysm coiling
Authors:
Julian Schwarting,
Fabian Holzberger,
Markus Muhr,
Martin Renz,
Tobias Boeckh-Behrens,
Barbara Wohlmuth,
Jan Kirschke
Abstract:
Rupture of intracranial aneurysms results in severe subarachnoidal hemorrhage, which is associated with high morbidity and mortality. Neurointerventional occlusion of the aneurysm through coiling has evolved to a therapeutical standard. The choice of the specific coil has an important influence on secondary regrowth requiring retreatment. Aneurysm occlusion was simulated either through virtual imp…
▽ More
Rupture of intracranial aneurysms results in severe subarachnoidal hemorrhage, which is associated with high morbidity and mortality. Neurointerventional occlusion of the aneurysm through coiling has evolved to a therapeutical standard. The choice of the specific coil has an important influence on secondary regrowth requiring retreatment. Aneurysm occlusion was simulated either through virtual implantation of a preshaped 3D coil or with a porous media approach. In this study, we used a recently developed numerical approach to simulate aneurysm shapes in specific challenging aneurysm anatomies and correlated these with aneurysm recurrence 6 months after treatment. The simulation showed a great variety of coil shapes depending on the variability in possible microcatheter positions. Aneurysms with a later recurrence showed a tendency for more successful coiling attempts. Results revealed further trends suggesting lower simulated packing densities in aneurysms with reoccurrence. Simulated packing densities did not correlate with those calculated by conventional software, indicating the potential for our approach to offer additional predictive value. Our study, therefore, pioneers a comprehensive numerical model for simulating aneurysm coiling, providing insights into individualized treatment strategies and outcome prediction. Future directions involve expanding the model's capabilities to simulate intraprocedural outcomes and long-term predictions, aiming to refine occlusion quality criteria and validate prediction parameters in larger patient cohorts. This simulation framework holds promise for enhancing clinical decision-making and optimizing patient outcomes in endovascular aneurysm treatment.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Towards Mobility Data Science (Vision Paper)
Authors:
Mohamed Mokbel,
Mahmoud Sakr,
Li Xiong,
Andreas Züfle,
Jussara Almeida,
Taylor Anderson,
Walid Aref,
Gennady Andrienko,
Natalia Andrienko,
Yang Cao,
Sanjay Chawla,
Reynold Cheng,
Panos Chrysanthis,
Xiqi Fei,
Gabriel Ghinita,
Anita Graser,
Dimitrios Gunopulos,
Christian Jensen,
Joon-Seok Kim,
Kyoung-Sook Kim,
Peer Kröger,
John Krumm,
Johannes Lauer,
Amr Magdy,
Mario Nascimento
, et al. (23 additional authors not shown)
Abstract:
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences…
▽ More
Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.
△ Less
Submitted 7 March, 2024; v1 submitted 21 June, 2023;
originally announced July 2023.
-
Model Preserving Compression for Neural Networks
Authors:
Jerry Chee,
Megan Renz,
Anil Damle,
Christopher De Sa
Abstract:
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning.…
▽ More
After training complex deep learning models, a common task is to compress the model to reduce compute and storage demands. When compressing, it is desirable to preserve the original model's per-example decisions (e.g., to go beyond top-1 accuracy or preserve robustness), maintain the network's structure, automatically determine per-layer compression levels, and eliminate the need for fine tuning. No existing compression methods simultaneously satisfy these criteria $\unicode{x2014}$ we introduce a principled approach that does by leveraging interpolative decompositions. Our approach simultaneously selects and eliminates channels (analogously, neurons), then constructs an interpolation matrix that propagates a correction into the next layer, preserving the network's structure. Consequently, our method achieves good performance even without fine tuning and admits theoretical analysis. Our theoretical generalization bound for a one layer network lends itself naturally to a heuristic that allows our method to automatically choose per-layer sizes for deep networks. We demonstrate the efficacy of our approach with strong empirical performance on a variety of tasks, models, and datasets $\unicode{x2014}$ from simple one-hidden-layer networks to deep networks on ImageNet.
△ Less
Submitted 14 October, 2022; v1 submitted 30 July, 2021;
originally announced August 2021.
-
Complete and Sufficient Spatial Domination of Multidimensional Rectangles
Authors:
Tobias Emrich,
Hans-Peter Kriegel,
Andreas Züfle,
Peer Kröger,
Matthias Renz
Abstract:
Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systems and index structures. Many tasks, such as nearest neighbor search and similarity ranking, require to decide if objects in one rectangle A may, must, or must not be closer to objects in a second rectangle B, than objects in a third rectangle R. To decide this relation of "Spatial Domination" it ca…
▽ More
Rectangles are used to approximate objects, or sets of objects, in a plethora of applications, systems and index structures. Many tasks, such as nearest neighbor search and similarity ranking, require to decide if objects in one rectangle A may, must, or must not be closer to objects in a second rectangle B, than objects in a third rectangle R. To decide this relation of "Spatial Domination" it can be shown that using minimum and maximum distances it is often impossible to detect spatial domination. This spatial gem provides a necessary and sufficient decision criterion for spatial domination that can be computed efficiently even in higher dimensional space. In addition, this spatial gem provides an example, pseudocode and an implementation in Python.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Glass phenomenology in the hard matrix model
Authors:
J. Dong,
V. Elser,
G. Gyawali,
K. Y. Jee,
J. Kent-Dobias,
A. Mandaiya,
M. Renz,
Y. Su
Abstract:
We introduce a new toy model for the study of glasses: the hard-matrix model (HMM). This may be viewed as a single particle moving on $\mathrm{SO}(N)$, where there is a potential proportional to the 1-norm of the matrix. The ground states of the model are "crystals" where all matrix elements have the same magnitude. These are the Hadamard matrices when $N$ is divisible by four. Just as finding the…
▽ More
We introduce a new toy model for the study of glasses: the hard-matrix model (HMM). This may be viewed as a single particle moving on $\mathrm{SO}(N)$, where there is a potential proportional to the 1-norm of the matrix. The ground states of the model are "crystals" where all matrix elements have the same magnitude. These are the Hadamard matrices when $N$ is divisible by four. Just as finding the latter has challenged mathematicians, our model fails to find them upon cooling and instead shows all the behaviors that characterize physical glasses. With simulations we have located the first-order crystallization temperature, the Kauzmann temperature where the glass would have the same entropy as the crystal, as well as the standard, measurement-time dependent glass transition temperature. Our model also brings to light a new kind of elementary excitation special to the glass phase: the "rubicon". In our model these are associated with the finite density of matrix elements near zero, the maximum in their contribution to the energy. Rubicons enable the system to cross between basins without thermal activation, a possibility not much discussed in the standard landscape picture. We use these modes to explain the slow dynamics in our model and speculate about their role in its quantum extension in the context of many-body localization.
△ Less
Submitted 2 August, 2021; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Efficient Information Flow Maximization in Probabilistic Graphs
Authors:
Christian Frey,
Andreas Züfle,
Tobias Emrich,
Matthias Renz
Abstract:
Reliable propagation of information through large networks, e.g., communication networks, social networks or sensor networks is very important in many applications concerning marketing, social networks, and wireless sensor networks. However, social ties of friendship may be obsolete, and communication links may fail, inducing the notion of uncertainty in such networks. In this paper, we address th…
▽ More
Reliable propagation of information through large networks, e.g., communication networks, social networks or sensor networks is very important in many applications concerning marketing, social networks, and wireless sensor networks. However, social ties of friendship may be obsolete, and communication links may fail, inducing the notion of uncertainty in such networks. In this paper, we address the problem of optimizing information propagation in uncertain networks given a constrained budget of edges. We show that this problem requires to solve two NP-hard subproblems: the computation of expected information flow, and the optimal choice of edges. To compute the expected information flow to a source vertex, we propose the F-tree as a specialized data structure, that identifies independent components of the graph for which the information flow can either be computed analytically and efficiently, or for which traditional Monte-Carlo sampling can be applied independently of the remaining network. For the problem of finding the optimal edges, we propose a series of heuristics that exploit properties of this data structure. Our evaluation shows that these heuristics lead to high quality solutions, thus yielding high information flow, while maintaining low running time.
△ Less
Submitted 5 May, 2018; v1 submitted 19 January, 2017;
originally announced January 2017.
-
Scenic Routes Now: Efficiently Solving the Time-Dependent Arc Orienteering Problem
Authors:
Gregor Jossé,
Ying Lu,
Tobias Emrich,
Matthias Renz,
Cyrus Shahabi,
Ugur Demiryurek,
Matthias Schubert
Abstract:
This paper extends the Arc Orienteering Problem (AOP) to large road networks with time-dependent travel times and time-dependent value gain, termed Twofold Time-Dependent AOP or 2TD-AOP for short. In its original definition, the NP-hard Orienteering Problem (OP) asks to find a path from a source to a destination maximizing the accumulated value while not exceeding a cost budget. Variations of the…
▽ More
This paper extends the Arc Orienteering Problem (AOP) to large road networks with time-dependent travel times and time-dependent value gain, termed Twofold Time-Dependent AOP or 2TD-AOP for short. In its original definition, the NP-hard Orienteering Problem (OP) asks to find a path from a source to a destination maximizing the accumulated value while not exceeding a cost budget. Variations of the OP and AOP have many practical applications such as mobile crowdsourcing tasks (e.g., repairing and maintenance or dispatching field workers), diverse logistics problems (e.g., crowd control or controlling wildfires) as well as several tourist guidance problems (e.g., generating trip recommendations or navigating through theme parks). In the proposed 2TD-AOP, travel times and value functions are assumed to be time-dependent. The dynamic values model, for instance, varying rewards in crowdsourcing tasks or varying urgency levels in damage control tasks. We discuss this novel problem, prove the benefit of time-dependence empirically and present an efficient approximative solution, optimized for fast response systems. Our approach is the first time-dependent variant of the AOP to be evaluated on a large scale, fine-grained, real-world road network. We show that optimal solutions are infeasible and solutions to the static problem are often invalid. We propose an approximate dynamic programming solution which produces valid paths and is orders of magnitude faster than any optimal solution.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
3D-localization microscopy and tracking of FoF1-ATP synthases in living bacteria
Authors:
Anja Renz,
Marc Renz,
Diana Kluetsch,
Gabriele Deckers-Hebestreit,
Michael Börsch
Abstract:
FoF1-ATP synthases are membrane-embedded protein machines that catalyze the synthesis of adenosine triphosphate. Using photoactivation-based localization microscopy (PALM) in TIR-illumination as well as structured illumination microscopy (SIM), we explore the spatial distribution and track single FoF1-ATP synthases in living E. coli cells under physiological conditions at different temperatures. F…
▽ More
FoF1-ATP synthases are membrane-embedded protein machines that catalyze the synthesis of adenosine triphosphate. Using photoactivation-based localization microscopy (PALM) in TIR-illumination as well as structured illumination microscopy (SIM), we explore the spatial distribution and track single FoF1-ATP synthases in living E. coli cells under physiological conditions at different temperatures. For quantitative diffusion analysis by mean-squared-displacement measurements, the limited size of the observation area in the membrane with its significant membrane curvature has to be considered. Therefore, we applied a 'sliding observation window' approach (M. Renz et al., Proc. SPIE 8225, 2012) and obtained the one-dimensional diffusion coefficient of FoF1-ATP synthase diffusing on the long axis in living E. coli cells.
△ Less
Submitted 2 February, 2015;
originally announced February 2015.
-
Towards Knowledge-Enriched Path Computation
Authors:
Georgios Skoumas,
Klaus Arthur Schmid,
Gregor Jossé,
Andreas Züfle,
Mario A. Nascimento,
Matthias Renz,
Dieter Pfoser
Abstract:
Directions and paths, as commonly provided by navigation systems, are usually derived considering absolute metrics, e.g., finding the shortest path within an underlying road network. With the aid of crowdsourced geospatial data we aim at obtaining paths that do not only minimize distance but also lead through more popular areas using knowledge generated by users. We extract spatial relations such…
▽ More
Directions and paths, as commonly provided by navigation systems, are usually derived considering absolute metrics, e.g., finding the shortest path within an underlying road network. With the aid of crowdsourced geospatial data we aim at obtaining paths that do not only minimize distance but also lead through more popular areas using knowledge generated by users. We extract spatial relations such as "nearby" or "next to" from travel blogs, that define closeness between pairs of points of interest (PoIs) and quantify each of these relations using a probabilistic model. Subsequently, we create a relationship graph where each node corresponds to a PoI and each edge describes the spatial connection between the respective PoIs. Using Bayesian inference we obtain a probabilistic measure of spatial closeness according to the crowd. Applying this measure to the corresponding road network, we obtain an altered cost function which does not exclusively rely on distance, and enriches an actual road networks taking crowdsourced spatial relations into account. Finally, we propose two routing algorithms on the enriched road networks. To evaluate our approach, we use Flickr photo data as a ground truth for popularity. Our experimental results -- based on real world datasets -- show that the paths computed w.r.t.\ our alternative cost function yield competitive solutions in terms of path length while also providing more "popular" paths, making routing easier and more informative for the user.
△ Less
Submitted 9 September, 2014;
originally announced September 2014.
-
Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories
Authors:
Johannes Niedermayer,
Andreas Züfle,
Tobias Emrich,
Matthias Renz,
Nikos Mamoulis,
Lei Chen,
Hans-Peter Kriegel
Abstract:
Nearest neighbor (NN) queries in trajectory databases have received significant attention in the past, due to their application in spatio-temporal data analysis. Recent work has considered the realistic case where the trajectories are uncertain; however, only simple uncertainty models have been proposed, which do not allow for accurate probabilistic search. In this paper, we fill this gap by addre…
▽ More
Nearest neighbor (NN) queries in trajectory databases have received significant attention in the past, due to their application in spatio-temporal data analysis. Recent work has considered the realistic case where the trajectories are uncertain; however, only simple uncertainty models have been proposed, which do not allow for accurate probabilistic search. In this paper, we fill this gap by addressing probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model. We study three nearest neighbor query semantics that take as input a query state or trajectory $q$ and a time interval. For some queries, we show that no polynomial time solution can be found. For problems that can be solved in PTIME, we present exact query evaluation algorithms, while for the general case, we propose a sophisticated sampling approach, which uses Bayesian inference to guarantee that sampled trajectories conform to the observation data stored in the database. This sampling approach can be used in Monte-Carlo based approximation solutions. We include an extensive experimental study to support our theoretical results.
△ Less
Submitted 20 January, 2014; v1 submitted 15 May, 2013;
originally announced May 2013.
-
Monitoring subunit rotation in single FRET-labeled FoF1-ATP synthase in an anti-Brownian electrokinetic trap
Authors:
Thomas Heitkamp,
Hendrik Sielaff,
Anja Korn,
Marc Renz,
Nawid Zarrabi,
Michael Boersch
Abstract:
FoF1-ATP synthase is the membrane protein catalyzing the synthesis of the 'biological energy currency' adenosine triphosphate (ATP). The enzyme uses internal subunit rotation for the mechanochemical conversion of a proton motive force to the chemical bond. We apply single-molecule Förster resonance energy transfer (FRET) to monitor subunit rotation in the two coupled motors F1 and Fo. Therefore, e…
▽ More
FoF1-ATP synthase is the membrane protein catalyzing the synthesis of the 'biological energy currency' adenosine triphosphate (ATP). The enzyme uses internal subunit rotation for the mechanochemical conversion of a proton motive force to the chemical bond. We apply single-molecule Förster resonance energy transfer (FRET) to monitor subunit rotation in the two coupled motors F1 and Fo. Therefore, enzymes have to be isolated from the plasma membranes of Escherichia coli, fluorescently labeled and reconstituted into 120-nm sized lipid vesicles to yield proteoliposomes. These freely diffusing proteoliposomes occasionally traverse the confocal detection volume resulting in a burst of photons. Conformational dynamics of the enzyme are identified by sequential changes of FRET efficiencies within a single photon burst. The observation times can be extended by capturing single proteoliposomes in an anti-Brownian electrokinetic trap (ABELtrap, invented by A. E. Cohen and W. E. Moerner). Here we describe the preparation procedures of FoF1-ATP synthase and simulate FRET efficiency trajectories for 'trapped' proteoliposomes. Hidden Markov Models are applied at signal-to-background ratio limits for identifying the dwells and substeps of the rotary enzyme when running at low ATP concentrations, excited by low laser power, and confined by the ABELtrap.
△ Less
Submitted 11 February, 2013;
originally announced February 2013.
-
Diffusion properties of single FoF1-ATP synthases in a living bacterium unraveled by localization microscopy
Authors:
Marc Renz,
Torsten Rendler,
Michael Boersch
Abstract:
FoF1-ATP synthases in Escherichia coli (E. coli) bacteria are membrane-bound enzymes which use an internal proton-driven rotary double motor to catalyze the synthesis of adenosine triphosphate (ATP). According to the 'chemiosmotic hypothesis', a series of proton pumps generate the necessary pH difference plus an electric potential across the bacterial plasma membrane. These proton pumps are redox-…
▽ More
FoF1-ATP synthases in Escherichia coli (E. coli) bacteria are membrane-bound enzymes which use an internal proton-driven rotary double motor to catalyze the synthesis of adenosine triphosphate (ATP). According to the 'chemiosmotic hypothesis', a series of proton pumps generate the necessary pH difference plus an electric potential across the bacterial plasma membrane. These proton pumps are redox-coupled membrane enzymes which are possibly organized in supercomplexes, as shown for the related enzymes in the mitochondrial inner membrane. We report diffusion measurements of single fluorescent FoF1-ATP synthases in living E. coli by localization microscopy and single enzyme tracking to distinguish a monomeric enzyme from a supercomplex-associated form in the bacterial membrane. For quantitative mean square displacement (MSD) analysis, the limited size of the observation area in the membrane with a significant membrane curvature had to be considered. The E. coli cells had a diameter of about 500 nm and a length of about 2 to 3 \mum. Because the surface coordinate system yielded different localization precision, we applied a sliding observation window approach to obtain the diffusion coefficient D = 0.072 \mum2/s of FoF1-ATP synthase in living E. coli cells.
△ Less
Submitted 30 January, 2012;
originally announced January 2012.
-
Inverse Queries For Multidimensional Spaces
Authors:
Thomas Bernecker,
Tobias Emrich,
Hans-Peter Kriegel,
Nikos Mamoulis,
Matthias Renz,
Shiming Zhang,
Andreas Züfle
Abstract:
Traditional spatial queries return, for a given query object $q$, all database objects that satisfy a given predicate, such as epsilon range and $k$-nearest neighbors. This paper defines and studies {\em inverse} spatial queries, which, given a subset of database objects $Q$ and a query predicate, return all objects which, if used as query objects with the predicate, contain $Q$ in their result. W…
▽ More
Traditional spatial queries return, for a given query object $q$, all database objects that satisfy a given predicate, such as epsilon range and $k$-nearest neighbors. This paper defines and studies {\em inverse} spatial queries, which, given a subset of database objects $Q$ and a query predicate, return all objects which, if used as query objects with the predicate, contain $Q$ in their result. We first show a straightforward solution for answering inverse spatial queries for any query predicate. Then, we propose a filter-and-refinement framework that can be used to improve efficiency. We show how to apply this framework on a variety of inverse queries, using appropriate space pruning strategies. In particular, we propose solutions for inverse epsilon range queries, inverse $k$-nearest neighbor queries, and inverse skyline queries. Our experiments show that our framework is significantly more efficient than naive approaches.
△ Less
Submitted 5 May, 2011; v1 submitted 1 March, 2011;
originally announced March 2011.
-
Monitoring single membrane protein dynamics in a liposome manipulated in solution by the ABELtrap
Authors:
Torsten Rendler,
Marc Renz,
Eva Hammann,
Stefan Ernst,
Nawid Zarrabi,
Michael Boersch
Abstract:
FoF1-ATP synthase is the essential membrane enzyme maintaining the cellular level of adenosine triphosphate (ATP) and comprises two rotary motors. We measure subunit rotation in FoF1-ATP synthase by intramolecular Foerster resonance energy transfer (FRET) between two fluorophores at the rotor and at the stator of the enzyme. Confocal FRET measurements of freely diffusing single enzymes in lipid ve…
▽ More
FoF1-ATP synthase is the essential membrane enzyme maintaining the cellular level of adenosine triphosphate (ATP) and comprises two rotary motors. We measure subunit rotation in FoF1-ATP synthase by intramolecular Foerster resonance energy transfer (FRET) between two fluorophores at the rotor and at the stator of the enzyme. Confocal FRET measurements of freely diffusing single enzymes in lipid vesicles are limited to hundreds of milliseconds by the transit times through the laser focus. We evaluate two different methods to trap the enzyme inside the confocal volume in order to extend the observation times. Monte Carlo simulations show that optical tweezers with low laser power are not suitable for lipid vesicles with a diameter of 130 nm. A. E. Cohen (Harvard) and W. E. Moerner (Stanford) have recently developed an Anti-Brownian electrokinetic trap (ABELtrap) which is capable to apparently immobilize single molecules, proteins, viruses or vesicles in solution. Trap** of fluorescent particles is achieved by applying a real time, position-dependent feedback to four electrodes in a microfluidic device. The standard deviation from a given target position in the ABELtrap is smaller than 200 nm. We develop a combination of the ABELtrap with confocal FRET measurements to monitor single membrane enzyme dynamics by FRET for more than 10 seconds in solution.
△ Less
Submitted 31 January, 2011;
originally announced February 2011.
-
A Novel Probabilistic Pruning Approach to Speed Up Similarity Queries in Uncertain Databases
Authors:
Thomas Bernecker,
Tobias Emrich,
Hans-Peter Kriegel,
Nikos Mamoulis,
Matthias Renz,
Andreas Zuefle
Abstract:
In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variabl…
▽ More
In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variable denoted by the probabilistic domination count: Given an uncertain database object B, an uncertain reference object R and a set D of uncertain database objects in a multi-dimensional space, the probabilistic domination count denotes the number of uncertain objects in D that are closer to R than B. This domination count can be used to answer a wide range of probabilistic similarity queries. Specifically, we propose a novel geometric pruning filter and introduce an iterative filter-refinement strategy for conservatively and progressively estimating the probabilistic domination count in an efficient way while kee** correctness according to the possible world semantics. In an experimental evaluation, we show that our proposed technique allows to acquire tight probability bounds for the probabilistic domination count quickly, even for large uncertain databases.
△ Less
Submitted 5 May, 2011; v1 submitted 13 January, 2011;
originally announced January 2011.
-
Probabilistic Frequent Pattern Growth for Itemset Mining in Uncertain Databases (Technical Report)
Authors:
Thomas Bernecker,
Hans-Peter Kriegel,
Matthias Renz,
Florian Verhein,
Andreas Züfle
Abstract:
Frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied on standard (certain) transaction databases. Uncertain transaction databases consist of sets of existentially uncertain items. The uncertainty of items in transactions makes traditional techniques inapplicable. In this paper, we tackle the problem of finding proba…
▽ More
Frequent itemset mining in uncertain transaction databases semantically and computationally differs from traditional techniques applied on standard (certain) transaction databases. Uncertain transaction databases consist of sets of existentially uncertain items. The uncertainty of items in transactions makes traditional techniques inapplicable. In this paper, we tackle the problem of finding probabilistic frequent itemsets based on possible world semantics. In this context, an itemset X is called frequent if the probability that X occurs in at least minSup transactions is above a given threshold. We make the following contributions: We propose the first probabilistic FP-Growth algorithm (ProFP-Growth) and associated probabilistic FP-Tree (ProFP-Tree), which we use to mine all probabilistic frequent itemsets in uncertain transaction databases without candidate generation. In addition, we propose an efficient technique to compute the support probability distribution of an itemset in linear time using the concept of generating functions. An extensive experimental section evaluates the our proposed techniques and shows that our ProFP-Growth approach is significantly faster than the current state-of-the-art algorithm.
△ Less
Submitted 13 August, 2010;
originally announced August 2010.
-
Studying the Underlying Event in Drell-Yan and High Transverse Momentum Jet Production at the Tevatron
Authors:
The CDF Collaboration,
T. Aaltonen,
J. Adelman,
B. Alvarez Gonzalez,
S. Amerio,
D. Amidei,
A. Anastassov,
A. Annovi,
J. Antos,
G. Apollinari,
A. Apresyan,
T. Arisawa,
A. Artikov,
J. Asaadi,
W. Ashmanskas,
A. Attal,
A. Aurisano,
F. Azfar,
W. Badgett,
A. Barbaro-Galtieri,
V. E. Barnes,
B. A. Barnett,
P. Barria,
P. Bartos,
G. Bauer
, et al. (554 additional authors not shown)
Abstract:
We study the underlying event in proton-antiproton collisions by examining the behavior of charged particles (transverse momentum pT > 0.5 GeV/c, pseudorapidity |η| < 1) produced in association with large transverse momentum jets (~2.2 fb-1) or with Drell-Yan lepton-pairs (~2.7 fb-1) in the Z-boson mass region (70 < M(pair) < 110 GeV/c2) as measured by CDF at 1.96 TeV center-of-mass energy. We u…
▽ More
We study the underlying event in proton-antiproton collisions by examining the behavior of charged particles (transverse momentum pT > 0.5 GeV/c, pseudorapidity |η| < 1) produced in association with large transverse momentum jets (~2.2 fb-1) or with Drell-Yan lepton-pairs (~2.7 fb-1) in the Z-boson mass region (70 < M(pair) < 110 GeV/c2) as measured by CDF at 1.96 TeV center-of-mass energy. We use the direction of the lepton-pair (in Drell-Yan production) or the leading jet (in high-pT jet production) in each event to define three regions of η-φspace; toward, away, and transverse, where φis the azimuthal scattering angle. For Drell-Yan production (excluding the leptons) both the toward and transverse regions are very sensitive to the underlying event. In high-pT jet production the transverse region is very sensitive to the underlying event and is separated into a MAX and MIN transverse region, which helps separate the hard component (initial and final-state radiation) from the beam-beam remnant and multiple parton interaction components of the scattering. The data are corrected to the particle level to remove detector effects and are then compared with several QCD Monte-Carlo models. The goal of this analysis is to provide data that can be used to test and improve the QCD Monte-Carlo models of the underlying event that are used to simulate hadron-hadron collisions.
△ Less
Submitted 16 March, 2010;
originally announced March 2010.
-
Scalable Probabilistic Similarity Ranking in Uncertain Databases (Technical Report)
Authors:
Thomas Bernecker,
Hans-Peter Kriegel,
Nikos Mamoulis,
Matthias Renz,
Andreas Zuefle
Abstract:
This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ran…
▽ More
This paper introduces a scalable approach for probabilistic top-k similarity ranking on uncertain vector data. Each uncertain object is represented by a set of vector instances that are assumed to be mutually-exclusive. The objective is to rank the uncertain data according to their distance to a reference object. We propose a framework that incrementally computes for each object instance and ranking position, the probability of the object falling at that ranking position. The resulting rank probability distribution can serve as input for several state-of-the-art probabilistic ranking models. Existing approaches compute this probability distribution by applying a dynamic programming approach of quadratic complexity. In this paper we theoretically as well as experimentally show that our framework reduces this to a linear-time complexity while having the same memory requirements, facilitated by incremental accessing of the uncertain vector instances in increasing order of their distance to the reference object. Furthermore, we show how the output of our method can be used to apply probabilistic top-k ranking for the objects, according to different state-of-the-art definitions. We conduct an experimental evaluation on synthetic and real data, which demonstrates the efficiency of our approach.
△ Less
Submitted 16 July, 2009;
originally announced July 2009.