-
Toward Privacy and Utility Preserving Image Representation
Authors:
Ahmadreza Mosallanezhad,
Yasin N. Silva,
Michelle V. Mancenido,
Huan Liu
Abstract:
Face images are rich data items that are useful and can easily be collected in many applications, such as in 1-to-1 face verification tasks in the domain of security and surveillance systems. Multiple methods have been proposed to protect an individual's privacy by perturbing the images to remove traces of identifiable information, such as gender or race. However, significantly less attention has…
▽ More
Face images are rich data items that are useful and can easily be collected in many applications, such as in 1-to-1 face verification tasks in the domain of security and surveillance systems. Multiple methods have been proposed to protect an individual's privacy by perturbing the images to remove traces of identifiable information, such as gender or race. However, significantly less attention has been given to the problem of protecting images while maintaining optimal task utility. In this paper, we study the novel problem of creating privacy-preserving image representations with respect to a given utility task by proposing a principled framework called the Adversarial Image Anonymizer (AIA). AIA first creates an image representation using a generative model, then enhances the learned image representations using adversarial learning to preserve privacy and utility for a given task. Experiments were conducted on a publicly available data set to demonstrate the effectiveness of AIA as a privacy-preserving mechanism for face images.
△ Less
Submitted 17 October, 2020; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Unsupervised Cyberbullying Detection via Time-Informed Gaussian Mixture Model
Authors:
Lu Cheng,
Kai Shu,
Siqi Wu,
Yasin N. Silva,
Deborah L. Hall,
Huan Liu
Abstract:
Social media is a vital means for information-sharing due to its easy access, low cost, and fast dissemination characteristics. However, increases in social media usage have corresponded with a rise in the prevalence of cyberbullying. Most existing cyberbullying detection methods are supervised and, thus, have two key drawbacks: (1) The data labeling process is often time-consuming and labor-inten…
▽ More
Social media is a vital means for information-sharing due to its easy access, low cost, and fast dissemination characteristics. However, increases in social media usage have corresponded with a rise in the prevalence of cyberbullying. Most existing cyberbullying detection methods are supervised and, thus, have two key drawbacks: (1) The data labeling process is often time-consuming and labor-intensive; (2) Current labeling guidelines may not be generalized to future instances because of different language usage and evolving social networks. To address these limitations, this work introduces a principled approach for unsupervised cyberbullying detection. The proposed model consists of two main components: (1) A representation learning network that encodes the social media session by exploiting multi-modal features, e.g., text, network, and time. (2) A multi-task learning network that simultaneously fits the comment inter-arrival times and estimates the bullying likelihood based on a Gaussian Mixture Model. The proposed model jointly optimizes the parameters of both components to overcome the shortcomings of decoupled training. Our core contribution is an unsupervised cyberbullying detection model that not only experimentally outperforms the state-of-the-art unsupervised models, but also achieves competitive performance compared to supervised models.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Similarity Group-by Operators for Multi-dimensional Relational Data
Authors:
Mingjie Tang,
Ruby Y. Tahboub,
Walid G. Are,
Mikhail J. Atallah,
Qutaibah M. Malluhi,
Mourad Ouzzani,
Yasin N. Silva
Abstract:
The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytic stack.While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grou** provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for shor…
▽ More
The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytic stack.While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grou** provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grou** data with similar but not necessarily equal values. While existing similarity-based grou** operators efficiently materialize this approximate semantics, they primarily focus on one-dimensional attributes and treat multidimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multidimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multidimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.
△ Less
Submitted 15 December, 2014;
originally announced December 2014.