Search | arXiv e-print repository

doi 10.5121/ijdkp.2013.3607

DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams

Authors: R H Vishwanath, T V Samartha, K C Srikantaiah, K R Venugopal, L M Patnaik

Abstract: Similarity matching and join of time series data streams has gained a lot of relevance in today's world that has large streaming data. This process finds wide scale application in the areas of location tracking, sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream increases, the cost involved to retain all the data in order to aid the process of… ▽ More Similarity matching and join of time series data streams has gained a lot of relevance in today's world that has large streaming data. This process finds wide scale application in the areas of location tracking, sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream increases, the cost involved to retain all the data in order to aid the process of similarity matching also increases. We develop a novel framework to addresses the following objectives. Firstly, Dimension reduction is performed in the preprocessing stage, where large stream data is segmented and reduced into a compact representation such that it retains all the crucial information by a technique called Multi-level Segment Means (MSM). This reduces the space complexity associated with the storage of large time-series data streams. Secondly, it incorporates effective Similarity Matching technique to analyze if the new data objects are symmetric to the existing data stream. And finally, the Pruning Technique that filters out the pseudo data object pairs and join only the relevant pairs. The computational cost for MSM is O(l*ni) and the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction Factor. We have performed exhaustive experimental trials to show that the proposed framework is both efficient and competent in comparison with earlier works. △ Less

Submitted 10 December, 2013; originally announced December 2013.

Comments: 20 pages,8 figures, 6 Tables

Journal ref: International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.6,pp.107-126, November 2013

arXiv:1309.2517 [pdf, ps, other]

Forecasting Stock Time-Series using Data Approximation and Pattern Sequence Similarity

Authors: R. H. Vishwanath, S. Leena, K. C. Srikantaiah, K. Shreekrishna Kumar, P. Deepa Shenoy, K. R. Venugopal, S. S. Iyengar, L. M. Patnaik

Abstract: Time series analysis is the process of building a model using statistical techniques to represent characteristics of time series data. Processing and forecasting huge time series data is a challenging task. This paper presents Approximation and Prediction of Stock Time-series data (APST), which is a two step approach to predict the direction of change of stock price indices. First, performs data a… ▽ More Time series analysis is the process of building a model using statistical techniques to represent characteristics of time series data. Processing and forecasting huge time series data is a challenging task. This paper presents Approximation and Prediction of Stock Time-series data (APST), which is a two step approach to predict the direction of change of stock price indices. First, performs data approximation by using the technique called Multilevel Segment Mean (MSM). In second phase, prediction is performed for the approximated data using Euclidian distance and Nearest-Neighbour technique. The computational cost of data approximation is O(n ni) and computational cost of prediction task is O(m |NN|). Thus, the accuracy and the time required for prediction in the proposed method is comparatively efficient than the existing Label Based Forecasting (LBF) method [1]. △ Less

Submitted 10 September, 2013; originally announced September 2013.

Comments: 11 pages

Journal ref: International Journal of Information Processing, 7(2), 90-100, 2013

arXiv:1304.4184 [pdf]

doi 10.5121/ijdkp.2013.3204

Bidirectional Growth based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns

Authors: K. C. Srikantaiah, N. Krishna Kumar, K. R. Venugopal, L. M. Patnaik

Abstract: Web sequential patterns are important for analyzing and understanding users behaviour to improve the quality of service offered by the World Wide Web. Web Prefetching is one such technique that utilizes prefetching rules derived through Cyclic Model Analysis of the mined Web sequential patterns. The more accurate the prediction and more satisfying the results of prefetching if we use a highly effi… ▽ More Web sequential patterns are important for analyzing and understanding users behaviour to improve the quality of service offered by the World Wide Web. Web Prefetching is one such technique that utilizes prefetching rules derived through Cyclic Model Analysis of the mined Web sequential patterns. The more accurate the prediction and more satisfying the results of prefetching if we use a highly efficient and scalable mining technique such as the Bidirectional Growth based Directed Acyclic Graph. In this paper, we propose a novel algorithm called Bidirectional Growth based mining Cyclic behavior Analysis of web sequential Patterns (BGCAP) that effectively combines these strategies to generate prefetching rules in the form of 2-sequence patterns with Periodicity and threshold of Cyclic Behaviour that can be utilized to effectively prefetch Web pages, thus reducing the users perceived latency. As BGCAP is based on Bidirectional pattern growth, it performs only (log n+1) levels of recursion for mining n Web sequential patterns. Our experimental results show that prefetching rules generated using BGCAP is 5-10 percent faster for different data sizes and 10-15% faster for a fixed data size than TD-Mine. In addition, BGCAP generates about 5-15 percent more prefetching rules than TD-Mine. △ Less

Submitted 15 April, 2013; originally announced April 2013.

Comments: 19 pages

Journal ref: International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.2, March 2013

arXiv:1209.5244 [pdf]

Ranking Search Engine Result Pages based on Trustworthiness of Websites

Authors: K. C. Srikantaiah, P. L. Srikanth, V. Tejaswi, K. Shaila, K. R. Venugopal, L. M. Patnaik

Abstract: The World Wide Web (WWW) is the repository of large number of web pages which can be accessed via Internet by multiple users at the same time and therefore it is Ubiquitous in nature. The search engine is a key application used to search the web pages from this huge repository, which uses the link analysis for ranking the web pages without considering the facts provided by them. A new algorithm ca… ▽ More The World Wide Web (WWW) is the repository of large number of web pages which can be accessed via Internet by multiple users at the same time and therefore it is Ubiquitous in nature. The search engine is a key application used to search the web pages from this huge repository, which uses the link analysis for ranking the web pages without considering the facts provided by them. A new algorithm called Probability of Correctness of Facts(PCF)-Engine is proposed to find the accuracy of the facts provided by the web pages. It uses the Probability based similarity function (SIM) which performs the string matching between the true facts and the facts of web pages to find their probability of correctness. The existing semantic search engines, may give the relevant result to the user query but may not be 100% accurate. Our algorithm computes trustworthiness of websites to rank the web pages. Simulation results show that our approach is efficient when compared with existing Voting and Truthfinder[1] algorithms with respect to the trustworthiness of the websites. △ Less

Submitted 24 September, 2012; originally announced September 2012.

Comments: 10 pages; IJCSI International Journal of Computer Science Issues, Vol. 9, Issue 4, No 2, July 2012

Showing 1–4 of 4 results for author: Srikantaiah, K C