Skip to main content

Showing 1–5 of 5 results for author: Mannila, H

.
  1. Tell Me Something I Don't Know: Randomization Strategies for Iterative Data Mining

    Authors: Sami Hanhijärvi, Markus Ojala, Niko Vuokko, Kai Puolamäki, Nikolaj Tatti, Heikki Mannila

    Abstract: There is a wide variety of data mining methods available, and it is generally useful in exploratory data analysis to use many different methods for the same dataset. This, however, leads to the problem of whether the results found by one method are a reflection of the phenomenon shown by the results of another method, or whether the results depict in some sense unrelated properties of the data. Fo… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Journal ref: KDD 2009: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

  2. What is the dimension of your binary data?

    Authors: Nikolaj Tatti, Taneli Mielikainen, Aristides Gionis, Heikki Mannila

    Abstract: Many 0/1 datasets have a very large number of variables; on the other hand, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal di… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  3. arXiv:1301.3884  [pdf

    cs.AI cs.DB

    Probabilistic Models for Query Approximation with Large Sparse Binary Datasets

    Authors: Dmitry Y. Pavlov, Heikki Mannila, Padhraic Smyth

    Abstract: Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical pr… ▽ More

    Submitted 16 January, 2013; originally announced January 2013.

    Comments: Appears in Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI2000)

    Report number: UAI-P-2000-PG-465-472

  4. arXiv:0906.5485  [pdf, ps, other

    cs.DB cs.AI

    Query Significance in Databases via Randomizations

    Authors: Markus Ojala, Gemma C. Garriga, Aristides Gionis, Heikki Mannila

    Abstract: Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries. As an example, in the Internet Movie Database (IMDb) a query can be used to check whether the average rank of action movies is higher than the average rank of drama movies. We consider the problem… ▽ More

    Submitted 30 June, 2009; originally announced June 2009.

    Comments: 10 pages

  5. arXiv:0809.3027  [pdf, ps, other

    cs.AI cs.DB physics.soc-ph

    Finding links and initiators: a graph reconstruction problem

    Authors: Heikki Mannila, Evimaria Terzi

    Abstract: Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i,j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix we study the problem of inferring the underlying directed links between entities (rows) and finding which entries in the matrix are initiators. We formally de… ▽ More

    Submitted 17 September, 2008; originally announced September 2008.

    ACM Class: H.2.8