Search | arXiv e-print repository

Unsupervised Learning for Fault Detection of HVAC Systems: An OPTICS -based Approach for Terminal Air Handling Units

Abstract: The rise of AI-powered classification techniques has ushered in a new era for data-driven Fault Detection and Diagnosis in smart building systems. While extensive research has championed supervised FDD approaches, the real-world application of unsupervised methods remains limited. Among these, cluster analysis stands out for its potential with Building Management System data. This study introduces… ▽ More The rise of AI-powered classification techniques has ushered in a new era for data-driven Fault Detection and Diagnosis in smart building systems. While extensive research has championed supervised FDD approaches, the real-world application of unsupervised methods remains limited. Among these, cluster analysis stands out for its potential with Building Management System data. This study introduces an unsupervised learning strategy to detect faults in terminal air handling units and their associated systems. The methodology involves pre-processing historical sensor data using Principal Component Analysis to streamline dimensions. This is then followed by OPTICS clustering, juxtaposed against k-means for comparison. The effectiveness of the proposed strategy was gauged using several labeled datasets depicting various fault scenarios and real-world building BMS data. Results showed that OPTICS consistently surpassed k-means in accuracy across seasons. Notably, OPTICS offers a unique visualization feature for users called reachability distance, allowing a preview of detected clusters before setting thresholds. Moreover, according to the results, while PCA is beneficial for reducing computational costs and enhancing noise reduction, thereby generally improving the clarity of cluster differentiation in reachability distance. It also has its limitations, particularly in complex fault scenarios. In such cases, PCA's dimensionality reduction may result in the loss of critical information, leading to some clusters being less discernible or entirely undetected. These overlooked clusters could be indicative of underlying faults, and their obscurity represents a significant limitation of PCA when identifying potential fault lines in intricate datasets. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 44 pages, 6 Tables, 7 Figures

MSC Class: 97M50

arXiv:2311.00143 [pdf, other]

Two-Stage Classifier for Campaign Negativity Detection using Axis Embeddings: A Case Study on Tweets of Political Users during 2021 Presidential Election in Iran

Authors: Fatemeh Rajabi, Ali Mohades

Abstract: In elections around the world, the candidates may turn their campaigns toward negativity due to the prospect of failure and time pressure. In the digital age, social media platforms such as Twitter are rich sources of political discourse. Therefore, despite the large amount of data that is published on Twitter, the automatic system for campaign negativity detection can play an essential role in un… ▽ More In elections around the world, the candidates may turn their campaigns toward negativity due to the prospect of failure and time pressure. In the digital age, social media platforms such as Twitter are rich sources of political discourse. Therefore, despite the large amount of data that is published on Twitter, the automatic system for campaign negativity detection can play an essential role in understanding the strategy of candidates and parties in their campaigns. In this paper, we propose a hybrid model for detecting campaign negativity consisting of a two-stage classifier that combines the strengths of two machine learning models. Here, we have collected Persian tweets from 50 political users, including candidates and government officials. Then we annotated 5,100 of them that were published during the year before the 2021 presidential election in Iran. In the proposed model, first, the required datasets of two classifiers based on the cosine similarity of tweet embeddings with axis embeddings (which are the average of embedding in positive and negative classes of tweets) from the training set (85\%) are made, and then these datasets are considered the training set of the two classifiers in the hybrid model. Finally, our best model (RF-RF) was able to achieve 79\% for the macro F1 score and 82\% for the weighted F1 score. By running the best model on the rest of the tweets of 50 political users that were published one year before the election and with the help of statistical models, we find that the publication of a tweet by a candidate has nothing to do with the negativity of that tweet, and the presence of the names of political persons and political organizations in the tweet is directly related to its negativity. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2105.03811 [pdf, other]

Click-Through Rate Prediction Using Graph Neural Networks and Online Learning

Authors: Farzaneh Rajabi, Jack Siyuan He

Abstract: Recommendation systems have been extensively studied by many literature in the past and are ubiquitous in online advertisement, shop** industry/e-commerce, query suggestions in search engines, and friend recommendation in social networks. Moreover, restaurant/music/product/movie/news/app recommendations are only a few of the applications of a recommender system. A small percent improvement on th… ▽ More Recommendation systems have been extensively studied by many literature in the past and are ubiquitous in online advertisement, shop** industry/e-commerce, query suggestions in search engines, and friend recommendation in social networks. Moreover, restaurant/music/product/movie/news/app recommendations are only a few of the applications of a recommender system. A small percent improvement on the CTR prediction accuracy has been mentioned to add millions of dollars of revenue to the advertisement industry. Click-Through-Rate (CTR) prediction is a special version of recommender system in which the goal is predicting whether or not a user is going to click on a recommended item. A content-based recommendation approach takes into account the past history of the user's behavior, i.e. the recommended products and the users reaction to them. So, a personalized model that recommends the right item to the right user at the right time is the key to building such a model. On the other hand, the so-called collaborative filtering approach incorporates the click history of the users who are very similar to a particular user, thereby hel** the recommender to come up with a more confident prediction for that particular user by leveraging the wider knowledge of users who share their taste in a connected network of users. In this project, we are interested in building a CTR predictor using Graph Neural Networks complemented by an online learning algorithm that models such dynamic interactions. By framing the problem as a binary classification task, we have evaluated this system both on the offline models (GNN, Deep Factorization Machines) with test-AUC of 0.7417 and on the online learning model with test-AUC of 0.7585 using a sub-sampled version of Criteo public dataset consisting of 10,000 data points. △ Less

Submitted 8 May, 2021; originally announced May 2021.

arXiv:2105.03804 [pdf, other]

Slash or burn: Power line and vegetation classification for wildfire prevention

Authors: Austin Park, Farzaneh Rajabi, Ross Weber

Abstract: Electric utilities are struggling to manage increasing wildfire risk in a hotter and drier climate. Utility transmission and distribution lines regularly ignite destructive fires when they make contact with surrounding vegetation. Trimming vegetation to maintain the separation from utility assets is as critical to safety as it is difficult. Each utility has tens of thousands of linear miles to man… ▽ More Electric utilities are struggling to manage increasing wildfire risk in a hotter and drier climate. Utility transmission and distribution lines regularly ignite destructive fires when they make contact with surrounding vegetation. Trimming vegetation to maintain the separation from utility assets is as critical to safety as it is difficult. Each utility has tens of thousands of linear miles to manage, poor knowledge of where those assets are located, and no way to prioritize trimming. Feature-enhanced convolutional neural networks (CNNs) have proven effective in this problem space. Histograms of oriented gradients (HOG) and Hough transforms are used to increase the salience of the linear structures like power lines and poles. Data is frequently taken from drone or satellite footage, but Google Street View offers an even more scalable and lower cost solution. This paper uses $1,320$ images scraped from Street View, transfer learning on popular CNNs, and feature engineering to place images in one of three classes: (1) no utility systems, (2) utility systems with no overgrown vegetation, or (3) utility systems with overgrown vegetation. The CNN output thus yields a prioritized vegetation management system and creates a geotagged map of utility assets as a byproduct. Test set accuracy with reached $80.15\%$ using VGG11 with a trained first layer and classifier, and a model ensemble correctly classified $88.88\%$ of images with risky vegetation overgrowth. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Showing 1–4 of 4 results for author: Rajabi, F