-
Basketball-SORT: An Association Method for Complex Multi-object Occlusion Problems in Basketball Multi-object Tracking
Authors:
Qingrui Hu,
Atom Scott,
Calvin Yeung,
Keisuke Fujii
Abstract:
Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusi…
▽ More
Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusion (CMOO). Here, we propose an online and robust MOT approach, named Basketball-SORT, which focuses on the CMOO problems in basketball videos. To overcome the CMOO problem, instead of using the intersection-over-union-based (IoU-based) approach, we use the trajectories of neighboring frames based on the projected positions of the players. Our method designs the basketball game restriction (BGR) and reacquiring Long-Lost IDs (RLLI) based on the characteristics of basketball scenes, and we also solve the occlusion problem based on the player trajectories and appearance features. Experimental results show that our method achieves a Higher Order Tracking Accuracy (HOTA) score of 63.48$\%$ on the basketball fixed video dataset and outperforms other recent popular approaches. Overall, our approach solved the CMOO problem more effectively than recent MOT algorithms.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements
Authors:
Calvin Yeung,
Kenjiro Ide,
Keisuke Fujii
Abstract:
Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to…
▽ More
Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Generalized Holographic Reduced Representations
Authors:
Calvin Yeung,
Zhuowen Zou,
Mohsen Imani
Abstract:
Deep learning has achieved remarkable success in recent years. Central to its success is its ability to learn representations that preserve task-relevant structure. However, massive energy, compute, and data costs are required to learn general representations. This paper explores Hyperdimensional Computing (HDC), a computationally and data-efficient brain-inspired alternative. HDC acts as a bridge…
▽ More
Deep learning has achieved remarkable success in recent years. Central to its success is its ability to learn representations that preserve task-relevant structure. However, massive energy, compute, and data costs are required to learn general representations. This paper explores Hyperdimensional Computing (HDC), a computationally and data-efficient brain-inspired alternative. HDC acts as a bridge between connectionist and symbolic approaches to artificial intelligence (AI), allowing explicit specification of representational structure as in symbolic approaches while retaining the flexibility of connectionist approaches. However, HDC's simplicity poses challenges for encoding complex compositional structures, especially in its binding operation. To address this, we propose Generalized Holographic Reduced Representations (GHRR), an extension of Fourier Holographic Reduced Representations (FHRR), a specific HDC implementation. GHRR introduces a flexible, non-commutative binding operation, enabling improved encoding of complex data structures while preserving HDC's desirable properties of robustness and transparency. In this work, we introduce the GHRR framework, prove its theoretical properties and its adherence to HDC properties, explore its kernel and binding characteristics, and perform empirical experiments showcasing its flexible non-commutativity, enhanced decoding accuracy for compositional structures, and improved memorization capacity compared to FHRR.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows
Authors:
Lindsey Linxi Wei,
Chung Yik Edward Yeung,
Hongjian Yu,
**gchuan Zhou,
Dong He,
Magdalena Balazinska
Abstract:
We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies bet…
▽ More
We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies between model saliency and human attention. This demonstration makes the following contributions:(1) the introduction of MaskSearch's graphical user interface (GUI), which enables interactive exploration of image databases through mask properties, (2) hands-on opportunities for users to explore MaskSearch's capabilities and constraints within machine learning workflows, and (3) an opportunity for conference attendees to understand how MaskSearch accelerates queries over image masks.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Self-Attention Based Semantic Decomposition in Vector Symbolic Architectures
Authors:
Calvin Yeung,
Prathyush Poduval,
Mohsen Imani
Abstract:
Vector Symbolic Architectures (VSAs) have emerged as a novel framework for enabling interpretable machine learning algorithms equipped with the ability to reason and explain their decision processes. The basic idea is to represent discrete information through high dimensional random vectors. Complex data structures can be built up with operations over vectors such as the "binding" operation involv…
▽ More
Vector Symbolic Architectures (VSAs) have emerged as a novel framework for enabling interpretable machine learning algorithms equipped with the ability to reason and explain their decision processes. The basic idea is to represent discrete information through high dimensional random vectors. Complex data structures can be built up with operations over vectors such as the "binding" operation involving element-wise vector multiplication, which associates data together. The reverse task of decomposing the associated elements is a combinatorially hard task, with an exponentially large search space. The main algorithm for performing this search is the resonator network, inspired by Hopfield network-based memory search operations.
In this work, we introduce a new variant of the resonator network, based on self-attention based update rules in the iterative search problem. This update rule, based on the Hopfield network with log-sum-exp energy function and norm-bounded states, is shown to substantially improve the performance and rate of convergence. As a result, our algorithm enables a larger capacity for associative memory, enabling applications in many tasks like perception based pattern recognition, scene decomposition, and object reasoning. We substantiate our algorithm with a thorough evaluation and comparisons to baselines.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Machine Learning for Soccer Match Result Prediction
Authors:
Rory Bunker,
Calvin Yeung,
Keisuke Fujii
Abstract:
Machine learning has become a common approach to predicting the outcomes of soccer matches, and the body of literature in this domain has grown substantially in the past decade and a half. This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance in this application domain. The aim of this chapter is to give a broad overview of the curren…
▽ More
Machine learning has become a common approach to predicting the outcomes of soccer matches, and the body of literature in this domain has grown substantially in the past decade and a half. This chapter discusses available datasets, the types of models and features, and ways of evaluating model performance in this application domain. The aim of this chapter is to give a broad overview of the current state and potential future developments in machine learning for soccer match results prediction, as a resource for those interested in conducting future studies in the area. Our main findings are that while gradient-boosted tree models such as CatBoost, applied to soccer-specific ratings such as pi-ratings, are currently the best-performing models on datasets containing only goals as the match features, there needs to be a more thorough comparison of the performance of deep learning models and Random Forest on a range of datasets with different types of features. Furthermore, new rating systems using both player- and team-level information and incorporating additional information from, e.g., spatiotemporal tracking and event data, could be investigated further. Finally, the interpretability of match result prediction models needs to be enhanced for them to be more useful for team management.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Foul prediction with estimated poses from soccer broadcast video
Authors:
Jiale Fang,
Calvin Yeung,
Keisuke Fujii
Abstract:
Recent advances in computer vision have made significant progress in tracking and pose estimation of sports players. However, there have been fewer studies on behavior prediction with pose estimation in sports, in particular, the prediction of soccer fouls is challenging because of the smaller image size of each player and of difficulty in the usage of e.g., the ball and pose information. In our r…
▽ More
Recent advances in computer vision have made significant progress in tracking and pose estimation of sports players. However, there have been fewer studies on behavior prediction with pose estimation in sports, in particular, the prediction of soccer fouls is challenging because of the smaller image size of each player and of difficulty in the usage of e.g., the ball and pose information. In our research, we introduce an innovative deep learning approach for anticipating soccer fouls. This method integrates video data, bounding box positions, image details, and pose information by curating a novel soccer foul dataset. Our model utilizes a combination of convolutional and recurrent neural networks (CNNs and RNNs) to effectively merge information from these four modalities. The experimental results show that our full model outperformed the ablated models, and all of the RNN modules, bounding box position and image, and estimated pose were useful for the foul prediction. Our findings have important implications for a deeper understanding of foul play in soccer and provide a valuable reference for future research and practice in this area.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees
Authors:
Calvin Yeung,
Rory Bunker,
Rikuhei Umemoto,
Keisuke Fujii
Abstract:
Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and los…
▽ More
Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent five years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package
Authors:
Marcin Rogowski,
Brandon C. Y. Yeung,
Oliver T. Schmidt,
Romit Maulik,
Lisandro Dalcin,
Matteo Parsani,
Gianmarco Mengaldo
Abstract:
We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the…
▽ More
We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the PySPOD (https://github.com/MathEXLab/PySPOD) library and makes use of the standard message passing interface (MPI) library, implemented in Python via mpi4py (https://mpi4py.readthedocs.io/en/stable/). An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, hel** to uncover new unexplored spatio-temporal patterns.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory
Authors:
Calvin C. K. Yeung,
Keisuke Fujii
Abstract:
Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender s…
▽ More
Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender should attempt to block the shot, play a crucial role in the outcome of the game. However, there are currently no effective data-driven and/or theory-based approaches to analyzing such situations. To address this issue, we proposed a novel framework to analyze such scenarios based on game theory, where we estimate the expected payoff with machine learning (ML) models, and additional features for ML models were extracted with a theory-based shot block model. Conventionally, successes or failures (1 or 0) are used as payoffs, while a success shot (goal) is extremely rare in football. Therefore, we proposed the Expected Probability of Shot On Target (xSOT) metric to evaluate players' actions even if the shot results in no goal; this allows for effective differentiation and comparison between different shots and even enables counterfactual shot situation analysis. In our experiments, we have validated the framework by comparing it with baseline and ablated models. Furthermore, we have observed a high correlation between the xSOT and existing metrics. This alignment of information suggests that xSOT provides valuable insights. Lastly, as an illustration, we studied optimal strategies in the World Cup 2022 and analyzed a shot situation in EURO 2020.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Remaining Useful Life Modelling with an Escalator Health Condition Analytic System
Authors:
Inez M. Zwetsloot,
Yu Lin,
Jiaqi Qiu,
Lishuai Li,
William Ka Fai Lee,
Edmond Yin San Yeung,
Colman Yiu Wah Yeung,
Chris Chun Long Wong
Abstract:
The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic syste…
▽ More
The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic system for escalators to support refurbishment decisions. The analytic system consists of four parts: 1) online data gathering and processing; 2) a dashboard for condition monitoring; 3) a health index model; and 4) remaining useful life prediction. The results can be used for a) predicting the remaining useful life of the escalators, in order to support asset replacement planning and b) monitoring the real-time condition of escalators; including alerts when vibration exceeds the threshold and signal diagnosis, giving an indication of possible root cause (components) of the alert signal.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Exploring AI-Generated Text in Student Writing: How Does AI Help?
Authors:
David James Woo,
Hengky Susanto,
Chi Ho Yeung,
Kai Guo,
April Ka Yeng Fung
Abstract:
English as foreign language_EFL_students' use of text generated from artificial intelligence_AI_natural language generation_NLG_tools may improve their writing quality. However, it remains unclear to what extent AI-generated text in these students' writing might lead to higher-quality writing. We explored 23 Hong Kong secondary school students' attempts to write stories comprising their own words…
▽ More
English as foreign language_EFL_students' use of text generated from artificial intelligence_AI_natural language generation_NLG_tools may improve their writing quality. However, it remains unclear to what extent AI-generated text in these students' writing might lead to higher-quality writing. We explored 23 Hong Kong secondary school students' attempts to write stories comprising their own words and AI-generated text. Human experts scored the stories for dimensions of content, language and organization. We analyzed the basic organization and structure and syntactic complexity of the stories' AI-generated text and performed multiple linear regression and cluster analyses. The results show the number of human words and the number of AI-generated words contribute significantly to scores. Besides, students can be grouped into competent and less competent writers who use more AI-generated text or less AI-generated text compared to their peers. Comparisons of clusters reveal some benefit of AI-generated text in improving the quality of both high-scoring students' and low-scoring students' writing. The findings can inform pedagogical strategies to use AI-generated text for EFL students' writing and to address digital divides. This study contributes designs of NLG tools and writing activities to implement AI-generated text in schools.
△ Less
Submitted 31 December, 2023; v1 submitted 10 March, 2023;
originally announced April 2023.
-
Transformer-Based Neural Marked Spatio Temporal Point Process Model for Football Match Events Analysis
Authors:
Calvin C. K. Yeung,
Tony Sit,
Keisuke Fujii
Abstract:
With recently available football match event data that record the details of football matches, analysts and researchers have a great opportunity to develop new performance metrics, gain insight, and evaluate key performance. However, most sports sequential events modeling methods and performance metrics approaches could be incomprehensive in dealing with such large-scale spatiotemporal data (in pa…
▽ More
With recently available football match event data that record the details of football matches, analysts and researchers have a great opportunity to develop new performance metrics, gain insight, and evaluate key performance. However, most sports sequential events modeling methods and performance metrics approaches could be incomprehensive in dealing with such large-scale spatiotemporal data (in particular, temporal process), thereby necessitating a more comprehensive spatiotemporal model and a holistic performance metric. To this end, we proposed the Transformer-Based Neural Marked Spatio Temporal Point Process (NMSTPP) model for football event data based on the neural temporal point processes (NTPP) framework. In the experiments, our model outperformed the prediction performance of the baseline models. Furthermore, we proposed the holistic possession utilization score (HPUS) metric for a more comprehensive football possession analysis. For verification, we examined the relationship with football teams' final ranking, average goal score, and average xG over a season. It was observed that the average HPUS showed significant correlations regardless of not using goal and details of shot information. Furthermore, we show HPUS examples in analyzing possessions, matches, and between matches.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures
Authors:
Christopher Yeung,
Benjamin Pham,
Zihan Zhang,
Katherine T. Fountaine,
Aaswath P. Raman
Abstract:
From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning appr…
▽ More
From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning approaches require large quantities of training data to produce high-performance models and have difficulty generalizing beyond training data given the complexity of the design space. Unsupervised and reinforcement learning-based approaches on the other hand can have very lengthy training or optimization times associated with them. Here we demonstrate a hybrid supervised learning and reinforcement learning approach to the inverse design of nanophotonic structures and show this approach can reduce training data dependence, improve the generalizability of model predictions, and shorten exploratory training times by orders of magnitude. The presented strategy thus addresses a number of contemporary deep learning-based challenges, while opening the door for new design methodologies that leverage multiple classes of machine learning algorithms to produce more effective and practical solutions for photonic design.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits
Authors:
Bo Li,
Chi Ho Yeung
Abstract:
The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the corresponding arm returns a random reward to the player, potentially from a specific unknown distribution. The target of the player is to collect as many rewards…
▽ More
The multi-armed bandit (MAB) model is one of the most classical models to study decision-making in an uncertain environment. In this model, a player chooses one of $K$ possible arms of a bandit machine to play at each time step, where the corresponding arm returns a random reward to the player, potentially from a specific unknown distribution. The target of the player is to collect as many rewards as possible during the process. Despite its simplicity, the MAB model offers an excellent playground for studying the trade-off between exploration versus exploitation and designing effective algorithms for sequential decision-making under uncertainty. Although many asymptotically optimal algorithms have been established, the finite-time behaviors of the stochastic dynamics of the MAB model appear much more challenging to analyze, due to the intertwine between the decision-making and the rewards being collected. In this paper, we employ techniques in statistical physics to analyze the MAB model, which facilitates the characterization of the distribution of cumulative regrets at a finite short time, the central quantity of interest in an MAB algorithm, as well as the intricate dynamical behaviors of the model. Our analytical results, in good agreement with simulations, point to the emergence of an interesting multimodal regret distribution, with large regrets resulting from excess exploitation of sub-optimal arms due to an initial unlucky output from the optimal one.
△ Less
Submitted 10 June, 2023; v1 submitted 11 August, 2022;
originally announced August 2022.
-
Scalable Node-Disjoint and Edge-Disjoint Multi-wavelength Routing
Authors:
Yi-Zhi Xu,
Ho Fai Po,
Chi Ho Yeung,
David Saad
Abstract:
Probabilistic message-passing algorithms are developed for routing transmissions in multi-wavelength optical communication networks, under node and edge-disjoint routing constraints and for various objective functions. Global routing optimization is a hard computational task on its own but is made much more difficult under the node/edge-disjoint constraints and in the presence of multiple waveleng…
▽ More
Probabilistic message-passing algorithms are developed for routing transmissions in multi-wavelength optical communication networks, under node and edge-disjoint routing constraints and for various objective functions. Global routing optimization is a hard computational task on its own but is made much more difficult under the node/edge-disjoint constraints and in the presence of multiple wavelengths, a problem which dominates routing efficiency in real optical communication networks that carry most of the world's Internet traffic. The scalable principled method we have developed is exact on trees but provides good approximate solutions on locally tree-like graphs. It accommodates a variety of objective functions that correspond to low latency, load balancing and consolidation of routes, and can be easily extended to include heterogeneous signal-to-noise values on edges and a restriction on the available wavelengths per edge. It can be used for routing and managing transmissions on existing topologies as well as for designing and modifying optical communication networks. Additionally, it provides the tool for settling an open and much debated question on the merit of wavelength-switching nodes and the added capabilities they provide. The methods have been tested on generated networks such as random-regular, Erdős Rényi and power-law graphs, as well as on the UK and US optical communication networks. They show excellent performance with respect to existing methodology on small networks and have been scaled up to network sizes that are beyond the reach of most existing algorithms.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Accumulative time-based ranking method to reputation evaluation in information networks
Authors:
Hao Liao,
Qi-xin Liu,
Ze-cheng Huang,
Chi Ho Yeung,
Yi-Cheng Zhang
Abstract:
With the rapid development of modern technology, the Web has become an important platform for users to make friends and acquire information. However, since information on the Web is over-abundant, information filtering becomes a key task for online users to obtain relevant suggestions. As most Websites can be ranked according to users' rating and preferences, relevance to queries, and recency, how…
▽ More
With the rapid development of modern technology, the Web has become an important platform for users to make friends and acquire information. However, since information on the Web is over-abundant, information filtering becomes a key task for online users to obtain relevant suggestions. As most Websites can be ranked according to users' rating and preferences, relevance to queries, and recency, how to extract the most relevant item from the over-abundant information is always a key topic for researchers in various fields. In this paper, we adopt tools used to analyze complex networks to evaluate user reputation and item quality. In our proposed accumulative time-based ranking (ATR) algorithm, we incorporate two behavioral weighting factors which are updated when users select or rate items, to reflect the evolution of user reputation and item quality over time. We showed that our algorithm outperforms state-of-the-art ranking algorithms in terms of precision and robustness on empirical datasets from various online retailers and the citation datasets among research publications.
△ Less
Submitted 3 March, 2020; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Characteristics of human mobility patterns revealed by high-frequency cell-phone position data
Authors:
Chen Zhao,
An Zeng,
Chi Ho Yeung
Abstract:
Human mobility is an important characteristic of human behavior, but since tracking personalized position to high temporal and spatial resolution is difficult, most studies on human mobility patterns rely largely on mathematical models. Seminal models which assume frequently visited locations tend to be re-visited, reproduce a wide range of statistical features including collective mobility fluxes…
▽ More
Human mobility is an important characteristic of human behavior, but since tracking personalized position to high temporal and spatial resolution is difficult, most studies on human mobility patterns rely largely on mathematical models. Seminal models which assume frequently visited locations tend to be re-visited, reproduce a wide range of statistical features including collective mobility fluxes and numerous scaling laws. However, these models cannot be verified at a time-scale relevant to our daily travel patterns as most available data do not provide the necessary temporal resolution. In this work, we re-examined human mobility mechanisms via comprehensive cell-phone position data recorded at a high frequency up to every second. We found that the next location visited by users is not their most frequently visited ones in many cases. Instead, individuals exhibit origin-dependent, path-preferential patterns in their short time-scale mobility. These behaviors are prominent when the temporal resolution of the data is high, and are thus overlooked in most previous studies. Incorporating measured quantities from our high frequency data into conventional human mobility models shows contradictory statistical results. We finally revealed that the individual preferential transition mechanism characterized by the first-order Markov process can quantitatively reproduce the observed travel patterns at both individual and population levels at all relevant time-scales.
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Learning the optimally coordinated routes from the statistical mechanics of polymers
Authors:
Hao Liao,
Xingtong Wu,
Mingyang Zhou,
Chi Ho Yeung
Abstract:
Many major cities suffer from severe traffic congestion. Road expansion in the cites is usually infeasible, and an alternative way to alleviate traffic congestion is to coordinate the route of vehicles. Various path selection and planning algorithms are thus proposed, but most existing methods only plan paths separately and provide un-coordinated solutions. Recently, an analogy between the coordin…
▽ More
Many major cities suffer from severe traffic congestion. Road expansion in the cites is usually infeasible, and an alternative way to alleviate traffic congestion is to coordinate the route of vehicles. Various path selection and planning algorithms are thus proposed, but most existing methods only plan paths separately and provide un-coordinated solutions. Recently, an analogy between the coordination of vehicular routes and the interaction of polymers is drawn; the spin glass theory in statistical physics is employed to optimally coordinate transportation routes. To further examine the advantages brought by path coordination, we incorporate the link congestion function developed by the Bureau of Public Roads (BPR) into the polymer routing algorithm. We then estimate in simulations the traveling time of all users saved by the polymer-BPR algorithm in randomly generated networks and real transportation networks in major cities including London, New York and Bei**g. We found that a large amount of traveling time is saved in all studied networks, suggesting that the approach inspired by polymer physics is effective in minimizing the traveling time via path coordination, which is a promising tool for alleviating traffic congestions.
△ Less
Submitted 3 June, 2019; v1 submitted 31 May, 2019;
originally announced June 2019.
-
Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory
Authors:
Chun-Kit Yeung
Abstract:
Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep…
▽ More
Deep learning based knowledge tracing model has been shown to outperform traditional knowledge tracing model without the need for human-engineered features, yet its parameters and representations have long been criticized for not being explainable. In this paper, we propose Deep-IRT which is a synthesis of the item response theory (IRT) model and a knowledge tracing model that is based on the deep neural network architecture called dynamic key-value memory network (DKVMN) to make deep learning based knowledge tracing explainable. Specifically, we use the DKVMN model to process the student's learning trajectory and estimate the student ability level and the item difficulty level over time. Then, we use the IRT model to estimate the probability that a student will answer an item correctly using the estimated student ability and the item difficulty. Experiments show that the Deep-IRT model retains the performance of the DKVMN model, while it provides a direct psychological interpretation of both students and items.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Incorporating Features Learned by an Enhanced Deep Knowledge Tracing Model for STEM/Non-STEM Job Prediction
Authors:
Chun-kit Yeung,
Zizheng Lin,
Kai Yang,
Dit-yan Yeung
Abstract:
The 2017 ASSISTments Data Mining competition aims to use data from a longitudinal study for predicting a brand-new outcome of students which had never been studied before by the educational data mining research community. Specifically, it facilitates research in develo** predictive models that predict whether the first job of a student out of college belongs to a STEM (the acronym for science, t…
▽ More
The 2017 ASSISTments Data Mining competition aims to use data from a longitudinal study for predicting a brand-new outcome of students which had never been studied before by the educational data mining research community. Specifically, it facilitates research in develo** predictive models that predict whether the first job of a student out of college belongs to a STEM (the acronym for science, technology, engineering, and mathematics) field. This is based on the student's learning history on the ASSISTments blended learning platform in the form of extensive clickstream data gathered during the middle school years. To tackle this challenge, we first estimate the expected knowledge state of students with respect to different mathematical skills using a deep knowledge tracing (DKT) model and an enhanced DKT (DKT+) model. We then combine the features corresponding to the DKT/DKT+ expected knowledge state with other features extracted directly from the student profile in the dataset to train several machine learning models for the STEM/non-STEM job prediction. Our experiments show that models trained with the combined features generally perform better than the models trained with the student profile alone. Detailed analysis of the student's knowledge state reveals that, when compared with non-STEM students, STEM students generally show a higher mastery level and a higher learning gain in mathematics.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Addressing Two Problems in Deep Knowledge Tracing via Prediction-Consistent Regularization
Authors:
Chun-Kit Yeung,
Dit-Yan Yeung
Abstract:
Knowledge tracing is one of the key research areas for empowering personalized education. It is a task to model students' mastery level of a knowledge component (KC) based on their historical learning trajectories. In recent years, a recurrent neural network model called deep knowledge tracing (DKT) has been proposed to handle the knowledge tracing task and literature has shown that DKT generally…
▽ More
Knowledge tracing is one of the key research areas for empowering personalized education. It is a task to model students' mastery level of a knowledge component (KC) based on their historical learning trajectories. In recent years, a recurrent neural network model called deep knowledge tracing (DKT) has been proposed to handle the knowledge tracing task and literature has shown that DKT generally outperforms traditional methods. However, through our extensive experimentation, we have noticed two major problems in the DKT model. The first problem is that the model fails to reconstruct the observed input. As a result, even when a student performs well on a KC, the prediction of that KC's mastery level decreases instead, and vice versa. Second, the predicted performance for KCs across time-steps is not consistent. This is undesirable and unreasonable because student's performance is expected to transit gradually over time. To address these problems, we introduce regularization terms that correspond to reconstruction and waviness to the loss function of the original DKT model to enhance the consistency in prediction. Experiments show that the regularized loss function effectively alleviates the two problems without degrading the original task of DKT.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Effective spreading from multiple leaders identified by percolation in social networks
Authors:
Shenggong Ji,
Linyuan Lu,
Chi Ho Yeung,
Yanqing Hu
Abstract:
Social networks constitute a new platform for information propagation, but its success is crucially dependent on the choice of spreaders who initiate the spreading of information. In this paper, we remove edges in a network at random and the network segments into isolated clusters. The most important nodes in each cluster then form a group of influential spreaders, such that news propagating from…
▽ More
Social networks constitute a new platform for information propagation, but its success is crucially dependent on the choice of spreaders who initiate the spreading of information. In this paper, we remove edges in a network at random and the network segments into isolated clusters. The most important nodes in each cluster then form a group of influential spreaders, such that news propagating from them would lead to an extensive coverage and minimal redundancy. The method well utilizes the similarities between the pre-percolated state and the coverage of information propagation in each social cluster to obtain a set of distributed and coordinated spreaders. Our tests on the Facebook networks show that this method outperforms conventional methods based on centrality. The suggested way of identifying influential spreaders thus sheds light on a new paradigm of information propagation on social networks.
△ Less
Submitted 18 August, 2015;
originally announced August 2015.
-
Modeling mutual feedback between users and recommender systems
Authors:
An Zeng,
Chi Ho Yeung,
Matus Medo,
Yi-Cheng Zhang
Abstract:
Recommender systems daily influence our decisions on the Internet. While considerable attention has been given to issues such as recommendation accuracy and user privacy, the long-term mutual feedback between a recommender system and the decisions of its users has been neglected so far. We propose here a model of network evolution which allows us to study the complex dynamics induced by this feedb…
▽ More
Recommender systems daily influence our decisions on the Internet. While considerable attention has been given to issues such as recommendation accuracy and user privacy, the long-term mutual feedback between a recommender system and the decisions of its users has been neglected so far. We propose here a model of network evolution which allows us to study the complex dynamics induced by this feedback, including the hysteresis effect which is typical for systems with non-linear dynamics. Despite the popular belief that recommendation helps users to discover new things, we find that the long-term use of recommendation can contribute to the rise of extremely popular items and thus ultimately narrow the user choice. These results are supported by measurements of the time evolution of item popularity inequality in real systems. We show that this adverse effect of recommendation can be tamed by sacrificing part of short-term recommendation accuracy.
△ Less
Submitted 7 August, 2015;
originally announced August 2015.
-
Do recommender systems benefit users?
Authors:
Chi Ho Yeung
Abstract:
Recommender systems are present in many web applications to guide our choices. They increase sales and benefit sellers, but whether they benefit customers by providing relevant products is questionable. Here we introduce a model to examine the benefit of recommender systems for users, and found that recommendations from the system can be equivalent to random draws if one relies too strongly on the…
▽ More
Recommender systems are present in many web applications to guide our choices. They increase sales and benefit sellers, but whether they benefit customers by providing relevant products is questionable. Here we introduce a model to examine the benefit of recommender systems for users, and found that recommendations from the system can be equivalent to random draws if one relies too strongly on the system. Nevertheless, with sufficient information about user preferences, recommendations become accurate and an abrupt transition to this accurate regime is observed for some algorithms. On the other hand, we found that a high accuracy evaluated by common accuracy metrics does not necessarily correspond to a high real accuracy nor a benefit for users, which serves as an alarm for operators and researchers of recommender systems. We tested our model with a real dataset and observed similar behaviors. Finally, a recommendation approach with improved accuracy is suggested. These results imply that recommender systems can benefit users, but relying too strongly on the system may render the system ineffective.
△ Less
Submitted 5 July, 2015;
originally announced July 2015.
-
Empirical studies on the network of social groups: the case of Tencent QQ
Authors:
Zhi-Qiang You,
Xiao-Pu Han,
Linyuan Lü,
Chi Ho Yeung
Abstract:
Participation in social groups are important but the collective behaviors of human as a group are difficult to analyze due to the difficulties to quantify ordinary social relation, group membership, and to collect a comprehensive dataset. Such difficulties can be circumvented by analyzing online social networks. In this paper, we analyze a comprehensive dataset obtained from Tencent QQ, an instant…
▽ More
Participation in social groups are important but the collective behaviors of human as a group are difficult to analyze due to the difficulties to quantify ordinary social relation, group membership, and to collect a comprehensive dataset. Such difficulties can be circumvented by analyzing online social networks. In this paper, we analyze a comprehensive dataset obtained from Tencent QQ, an instant messenger with the highest market share in China. Specifically, we analyze three derivative networks involving groups and their members -- the hypergraph of groups, the network of groups and the user network -- to reveal social interactions at microscopic and mesoscopic level. Our results uncover interesting behaviors on the growth of user groups, the interactions between groups, and their relationship with member age and gender. These findings lead to insights which are difficult to obtain in ordinary social networks.
△ Less
Submitted 24 August, 2014;
originally announced August 2014.
-
Shortest node-disjoint paths on random graphs
Authors:
Caterina De Bacco,
Silvio Franz,
David Saad,
Chi Ho Yeung
Abstract:
A localized method to distribute paths on random graphs is devised, aimed at finding the shortest paths between given source/destination pairs while avoiding path overlaps at nodes. We propose a method based on message-passing techniques to process global information and distribute paths optimally. Statistical properties such as scaling with system size and number of paths, average path-length and…
▽ More
A localized method to distribute paths on random graphs is devised, aimed at finding the shortest paths between given source/destination pairs while avoiding path overlaps at nodes. We propose a method based on message-passing techniques to process global information and distribute paths optimally. Statistical properties such as scaling with system size and number of paths, average path-length and the transition to the frustrated regime are analysed. The performance of the suggested algorithm is evaluated through a comparison against a greedy algorithm.
△ Less
Submitted 18 May, 2014; v1 submitted 31 January, 2014;
originally announced January 2014.
-
From the Physics of Interacting Polymers to Optimizing Routes on the London Underground
Authors:
Chi Ho Yeung,
David Saad,
K. Y. Michael Wong
Abstract:
Optimizing paths on networks is crucial for many applications, from subway traffic to Internet communication. As global path optimization that takes account of all path-choices simultaneously is computationally hard, most existing routing algorithms optimize paths individually, thus providing sub-optimal solutions. We employ the physics of interacting polymers and disordered systems to analyze mac…
▽ More
Optimizing paths on networks is crucial for many applications, from subway traffic to Internet communication. As global path optimization that takes account of all path-choices simultaneously is computationally hard, most existing routing algorithms optimize paths individually, thus providing sub-optimal solutions. We employ the physics of interacting polymers and disordered systems to analyze macroscopic properties of generic path-optimization problems and derive a simple, principled, generic and distributed routing algorithm capable of considering simultaneously all individual path choices. We demonstrate the efficacy of the new algorithm by applying it to: (i) random graphs resembling Internet overlay networks; (ii) travel on the London underground network based on Oyster-card data; and (iii) the global airport network. Analytically derived macroscopic properties give rise to insightful new routing phenomena, including phase transitions and scaling laws, which facilitate better understanding of the appropriate operational regimes and their limitations that are difficult to obtain otherwise.
△ Less
Submitted 3 September, 2013;
originally announced September 2013.
-
Recommender Systems
Authors:
Linyuan Lü,
Matus Medo,
Chi Ho Yeung,
Yi-Cheng Zhang,
Zi-Ke Zhang,
Tao Zhou
Abstract:
The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification an…
▽ More
The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
The Competition for Shortest Paths on Sparse Graphs
Authors:
Chi Ho Yeung,
David Saad
Abstract:
Optimal paths connecting randomly selected network nodes and fixed routers are studied analytically in the presence of non-linear overlap cost that penalizes congestion. Routing becomes increasingly more difficult as the number of selected nodes increases and exhibits ergodicity breaking in the case of multiple routers. A distributed linearly-scalable routing algorithm is devised. The ground state…
▽ More
Optimal paths connecting randomly selected network nodes and fixed routers are studied analytically in the presence of non-linear overlap cost that penalizes congestion. Routing becomes increasingly more difficult as the number of selected nodes increases and exhibits ergodicity breaking in the case of multiple routers. A distributed linearly-scalable routing algorithm is devised. The ground state of such systems reveals non-monotonic complex behaviors in both average path-length and algorithmic convergence, depending on the network topology, and densities of communicating nodes and routers.
△ Less
Submitted 1 February, 2012;
originally announced February 2012.
-
Networking - A Statistical Physics Perspective
Authors:
Chi Ho Yeung,
David Saad
Abstract:
Efficient networking has a substantial economic and societal impact in a broad range of areas including transportation systems, wired and wireless communications and a range of Internet applications. As transportation and communication networks become increasingly more complex, the ever increasing demand for congestion control, higher traffic capacity, quality of service, robustness and reduced en…
▽ More
Efficient networking has a substantial economic and societal impact in a broad range of areas including transportation systems, wired and wireless communications and a range of Internet applications. As transportation and communication networks become increasingly more complex, the ever increasing demand for congestion control, higher traffic capacity, quality of service, robustness and reduced energy consumption require new tools and methods to meet these conflicting requirements. The new methodology should serve for gaining better understanding of the properties of networking systems at the macroscopic level, as well as for the development of new principled optimization and management algorithms at the microscopic level. Methods of statistical physics seem best placed to provide new approaches as they have been developed specifically to deal with non-linear large scale systems. This paper aims at presenting an overview of tools and methods that have been developed within the statistical physics community and that can be readily applied to address the emerging problems in networking. These include diffusion processes, methods from disordered systems and polymer physics, probabilistic inference, which have direct relevance to network routing, file and frequency distribution, the exploration of network structures and vulnerability, and various other practical networking applications.
△ Less
Submitted 2 October, 2012; v1 submitted 13 October, 2011;
originally announced October 2011.
-
Tracing the Evolution of Physics on the Backbone of Citation Networks
Authors:
S. Gualdi,
C. H. Yeung,
Y. -C. Zhang
Abstract:
Many innovations are inspired by past ideas in a non-trivial way. Tracing these origins and identifying scientific branches is crucial for research inspirations. In this paper, we use citation relations to identify the descendant chart, i.e. the family tree of research papers. Unlike other spanning trees which focus on cost or distance minimization, we make use of the nature of citations and ident…
▽ More
Many innovations are inspired by past ideas in a non-trivial way. Tracing these origins and identifying scientific branches is crucial for research inspirations. In this paper, we use citation relations to identify the descendant chart, i.e. the family tree of research papers. Unlike other spanning trees which focus on cost or distance minimization, we make use of the nature of citations and identify the most important parent for each publication, leading to a tree-like backbone of the citation network. Measures are introduced to validate the backbone as the descendant chart. We show that citation backbones can well characterize the hierarchical and fractal structure of scientific development, and lead to accurate classification of fields and sub-fields.
△ Less
Submitted 5 August, 2011;
originally announced August 2011.
-
Leaders in Social Networks, the Delicious Case
Authors:
Linyuan Lu,
Yi-Cheng Zhang,
Chi Ho Yeung,
Tao Zhou
Abstract:
Finding pertinent information is not limited to search engines. Online communities can amplify the influence of a small number of power users for the benefit of all other users. Users' information foraging in depth and breadth can be greatly enhanced by choosing suitable leaders. For instance in delicious.com, users subscribe to leaders' collection which lead to a deeper and wider reach not achiev…
▽ More
Finding pertinent information is not limited to search engines. Online communities can amplify the influence of a small number of power users for the benefit of all other users. Users' information foraging in depth and breadth can be greatly enhanced by choosing suitable leaders. For instance in delicious.com, users subscribe to leaders' collection which lead to a deeper and wider reach not achievable with search engines. To consolidate such collective search, it is essential to utilize the leadership topology and identify influential users. Google's PageRank, as a successful search algorithm in the World Wide Web, turns out to be less effective in networks of people. We thus devise an adaptive and parameter-free algorithm, the LeaderRank, to quantify user influence. We show that LeaderRank outperforms PageRank in terms of ranking effectiveness, as well as robustness against manipulations and noisy data. These results suggest that leaders who are aware of their clout may reinforce the development of social networks, and thus the power of collective search.
△ Less
Submitted 27 March, 2011;
originally announced March 2011.
-
Self-organization in social tagging systems
Authors:
Chuang Liu,
Chi Ho Yeung,
Zi-Ke Zhang
Abstract:
Individuals often imitate each other to fall into the typical group, leading to a self-organized state of typical behaviors in a community. In this paper, we model self-organization in social tagging systems and illustrate the underlying interaction and dynamics. Specifically, we introduce a model in which individuals adjust their own tagging tendency to imitate the average tagging tendency. We fo…
▽ More
Individuals often imitate each other to fall into the typical group, leading to a self-organized state of typical behaviors in a community. In this paper, we model self-organization in social tagging systems and illustrate the underlying interaction and dynamics. Specifically, we introduce a model in which individuals adjust their own tagging tendency to imitate the average tagging tendency. We found that when users are of low confidence, they tend to imitate others and lead to a self-organized state with active tagging. On the other hand, when users are of high confidence and are stubborn for changes, tagging becomes inactive. We observe a phase transition at a critical level of user confidence when the system changes from one regime to the other. The distributions of post length obtained from the model are compared to real data which show good agreements.
△ Less
Submitted 19 February, 2011;
originally announced February 2011.
-
Enhancing synchronization by directionality in complex networks
Authors:
An Zeng,
Seung-Woo Son,
Chi Ho Yeung,
Ying Fan,
Zengru Di
Abstract:
We proposed a method called residual edge-betweenness gradient (REBG) to enhance synchronizability of networks by assignment of link direction while kee** network topology and link weight unchanged. Direction assignment has been shown to improve the synchronizability of undirected networks in general, but we find that in some cases incommunicable components emerge and networks fail to synchroniz…
▽ More
We proposed a method called residual edge-betweenness gradient (REBG) to enhance synchronizability of networks by assignment of link direction while kee** network topology and link weight unchanged. Direction assignment has been shown to improve the synchronizability of undirected networks in general, but we find that in some cases incommunicable components emerge and networks fail to synchronize. We show that the REBG method can effectively avoid the synchronization failure ($R=λ_{2}^{r}/λ_{N}^{r}=0$) which occurs in the residual degree gradient (RDG) method proposed in Phys. Rev. Lett. 103, 228702 (2009). Further experiments show that REBG method enhance synchronizability in networks with community structure as compared with the RDG method.
△ Less
Submitted 1 December, 2010;
originally announced December 2010.
-
Time-aware Collaborative Filtering with the Piecewise Decay Function
Authors:
Pei Wu,
Chi Ho Yeung,
Wei** Liu,
Cihang **,
Yi-Cheng Zhang
Abstract:
In this paper, we determine the appropriate decay function for item-based collaborative filtering (CF). Instead of intuitive deduction, we introduce the Similarity-Signal-to-Noise-Ratio (SSNR) to quantify the impacts of rated items on current recommendations. By measuring the variation of SSNR over time, drift in user interest is well visualized and quantified. Based on the trend changes of SSNR,…
▽ More
In this paper, we determine the appropriate decay function for item-based collaborative filtering (CF). Instead of intuitive deduction, we introduce the Similarity-Signal-to-Noise-Ratio (SSNR) to quantify the impacts of rated items on current recommendations. By measuring the variation of SSNR over time, drift in user interest is well visualized and quantified. Based on the trend changes of SSNR, the piecewise decay function is thus devised and incorporated to build our time-aware CF algorithm. Experiments show that the proposed algorithm strongly outperforms the conventional item-based CF algorithm and other time-aware algorithms with various decay functions.
△ Less
Submitted 19 October, 2010;
originally announced October 2010.
-
Heterogenous scaling in interevent time of on-line bookmarking
Authors:
Peng Wang,
Xiao-Yi Xie,
Chi Ho Yeung,
Bing-Hong Wang
Abstract:
In this paper, we study the statistical properties of bookmarking behaviors in Delicious.com. We find that the interevent time distributions of bookmarking decays powerlike as interevent time increases at both individual and population level. Remarkably, we observe a significant change in the exponent when interevent time increases from intra-day to inter-day range. In addition, dependence of expo…
▽ More
In this paper, we study the statistical properties of bookmarking behaviors in Delicious.com. We find that the interevent time distributions of bookmarking decays powerlike as interevent time increases at both individual and population level. Remarkably, we observe a significant change in the exponent when interevent time increases from intra-day to inter-day range. In addition, dependence of exponent on individual Activity is found to be different in the two ranges. These results suggests that mechanisms driving human actions are different in intra- and inter-day range. Instead of monotonically increasing with Activity, we find that inter-day exponent peaks at value around 3. We further show that less active users are more likely to resemble poisson process in bookmarking. Based on the temporal-preference model, preliminary explanations for this dependence have been given . Finally, a universal behavior in inter-day scale is observed by considering the rescaled variable.
△ Less
Submitted 18 October, 2010;
originally announced October 2010.
-
Dynamics underlying Box-office: Movie Competition on Recommender Systems
Authors:
C. H. Yeung,
G. Cimini,
C. -H. **
Abstract:
We introduce a simple model to study movie competition in the recommender systems. Movies of heterogeneous quality compete against each other through viewers' reviews and generate interesting dynamics of box-office. By assuming mean-field interactions between the competing movies, we show that run-away effect of popularity spreading is triggered by defeating the average review score, leading to hi…
▽ More
We introduce a simple model to study movie competition in the recommender systems. Movies of heterogeneous quality compete against each other through viewers' reviews and generate interesting dynamics of box-office. By assuming mean-field interactions between the competing movies, we show that run-away effect of popularity spreading is triggered by defeating the average review score, leading to hits in box-office. The average review score thus characterizes the critical movie quality necessary for transition from box-office bombs to blockbusters. The major factors affecting the critical review score are examined. By iterating the mean-field dynamical equations, we obtain qualitative agreements with simulations and real systems in the dynamical forms of box-office, revealing the significant role of competition in understanding box-office dynamics.
△ Less
Submitted 28 September, 2010; v1 submitted 14 May, 2010;
originally announced May 2010.