-
Creator Hearts: Investigating the Impact Positive Signals from YouTube Creators in Sha** Comment Section Behavior
Authors:
Frederick Choi,
Charlotte Lambert,
Vinay Koshy,
Sowmya Pratipati,
Tue Do,
Eshwar Chandrasekharan
Abstract:
Much of the research in online moderation focuses on punitive actions. However, emerging research has shown that positive reinforcement is effective at encouraging desirable behavior on online platforms. We extend this research by studying the "creator heart" feature on YouTube, quantifying their primary effects on comments that receive hearts and on videos where hearts have been given. We find th…
▽ More
Much of the research in online moderation focuses on punitive actions. However, emerging research has shown that positive reinforcement is effective at encouraging desirable behavior on online platforms. We extend this research by studying the "creator heart" feature on YouTube, quantifying their primary effects on comments that receive hearts and on videos where hearts have been given. We find that creator hearts increased the visibility of comments, and increased the amount of positive engagement they received from other users. We also find that the presence of a creator hearted comment soon after a video is published can incentivize viewers to comment, increasing the total engagement with the video over time. We discuss the potential for creators to use hearts to shape behavior in their communities by highlighting, rewarding, and incentivizing desirable behaviors from users. We discuss avenues for extending our study to understanding positive signals from moderators on other platforms.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Simpler Analyses of Union-Find
Authors:
Zhiyi Huang,
Chris Lambert,
Zipei Nie,
Richard Peng
Abstract:
We analyze union-find using potential functions motivated by continuous algorithms, and give alternate proofs of the $O(\log\log{n})$, $O(\log^{*}n)$, $O(\log^{**}n)$, and $O(α(n))$ amortized cost upper bounds. The proof of the $O(\log\log{n})$ amortized bound goes as follows. Let each node's potential be the square root of its size, i.e., the size of the subtree rooted from it. The overall potent…
▽ More
We analyze union-find using potential functions motivated by continuous algorithms, and give alternate proofs of the $O(\log\log{n})$, $O(\log^{*}n)$, $O(\log^{**}n)$, and $O(α(n))$ amortized cost upper bounds. The proof of the $O(\log\log{n})$ amortized bound goes as follows. Let each node's potential be the square root of its size, i.e., the size of the subtree rooted from it. The overall potential increase is $O(n)$ because the node sizes increase geometrically along any tree path. When compressing a path, each node on the path satisfies that either its potential decreases by $Ω(1)$, or its child's size along the path is less than the square root of its size: this can happen at most $O(\log\log{n})$ times along any tree path.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation when the SCAR assumption does not hold
Authors:
Praveen Kumar,
Christophe G. Lambert
Abstract:
Positive and Unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering posi…
▽ More
Positive and Unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering positives among the unlabeled (e.g., viable drugs among untested compounds). Most PU learning algorithms make the \emph{selected completely at random} (SCAR) assumption, namely that positives are selected independently of their features. However, in many real-world applications, such as healthcare, positives are not SCAR (e.g., severe cases are more likely to be diagnosed), leading to a poor estimate of the proportion, $α$, of positives among unlabeled examples and poor model calibration, resulting in an uncertain decision threshold for selecting positives. PU learning algorithms vary; some estimate only the proportion, $α$, of positives in the unlabeled set, while others calculate the probability that each specific unlabeled instance is positive, and some can do both. We propose two PU learning algorithms to estimate $α$, calculate calibrated probabilities for PU instances, and improve classification metrics: i) PULSCAR (positive unlabeled learning selected completely at random), and ii) PULSNAR (positive unlabeled learning selected not at random). PULSNAR employs a divide-and-conquer approach to cluster SNAR positives into subtypes and estimates $α$ for each subtype by applying PULSCAR to positives from each cluster and all unlabeled. In our experiments, PULSNAR outperformed state-of-the-art approaches on both synthetic and real-world benchmark datasets.
△ Less
Submitted 3 May, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Learning-Augmented B-Trees
Authors:
Xinyuan Cao,
**gbang Chen,
Li Chen,
Chris Lambert,
Richard Peng,
Daniel Sleator
Abstract:
We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps with composite priorities. The result is a simple search tree where the depth of each item is determined by its predicted weight $w_x$. To achieve the result, each item $x$ has its composite priority $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. This generalizes the recent lea…
▽ More
We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps with composite priorities. The result is a simple search tree where the depth of each item is determined by its predicted weight $w_x$. To achieve the result, each item $x$ has its composite priority $-\lfloor\log\log(1/w_x)\rfloor + U(0, 1)$ where $U(0, 1)$ is the uniform random variable. This generalizes the recent learning-augmented BSTs [Lin-Luo-Woodruff ICML`22], which only work for Zipfian distributions, to arbitrary inputs and predictions. It also gives the first B-Tree data structure that can provably take advantage of localities in the access sequence via online self-reorganization. The data structure is robust to prediction errors and handles insertions, deletions, as well as prediction updates.
△ Less
Submitted 24 July, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Inter-Sense: An Investigation of Sensory Blending in Fiction
Authors:
Roxana Girju,
Charlotte Lambert
Abstract:
This study reports on the semantic organization of English sensory descriptors of the five basic senses of sight, hearing, touch, taste, and smell in a large corpus of over 8,000 fiction books. We introduce a large-scale text data-driven approach based on distributional-semantic word embeddings to identify and extract these descriptors as well as analyze their mixing interconnections in the result…
▽ More
This study reports on the semantic organization of English sensory descriptors of the five basic senses of sight, hearing, touch, taste, and smell in a large corpus of over 8,000 fiction books. We introduce a large-scale text data-driven approach based on distributional-semantic word embeddings to identify and extract these descriptors as well as analyze their mixing interconnections in the resulting conceptual and sensory space. The findings are relevant for research on concept acquisition and representation, as well as for applications that can benefit from a better understanding of perceptual spaces of sensory experiences, in fiction, in particular, and in language in general.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
Model-based multi-parameter map**
Authors:
Yael Balbastre,
Mikael Brudfors,
Michela Azzarito,
Christian Lambert,
Martina F. Callaghan,
John Ashburner
Abstract:
Quantitative MR imaging is increasingly favoured for its richer information content and standardised measures. However, computing quantitative parameter maps, such as those encoding longitudinal relaxation rate (R1), apparent transverse relaxation rate (R2*) or magnetisation-transfer saturation (MTsat), involves inverting a highly non-linear function. Many methods for deriving parameter maps assum…
▽ More
Quantitative MR imaging is increasingly favoured for its richer information content and standardised measures. However, computing quantitative parameter maps, such as those encoding longitudinal relaxation rate (R1), apparent transverse relaxation rate (R2*) or magnetisation-transfer saturation (MTsat), involves inverting a highly non-linear function. Many methods for deriving parameter maps assume perfect measurements and do not consider how noise is propagated through the estimation procedure, resulting in needlessly noisy maps. Instead, we propose a probabilistic generative (forward) model of the entire dataset, which is formulated and inverted to jointly recover (log) parameter maps with a well-defined probabilistic interpretation (e.g., maximum likelihood or maximum a posteriori). The second order optimisation we propose for model fitting achieves rapid and stable convergence thanks to a novel approximate Hessian. We demonstrate the utility of our flexible framework in the context of recovering more accurate maps from data acquired using the popular multi-parameter map** protocol. We also show how to incorporate a joint total variation prior to further decrease the noise in the maps, noting that the probabilistic formulation allows the uncertainty on the recovered parameter maps to be estimated. Our implementation uses a PyTorch backend and benefits from GPU acceleration. It is available at https://github.com/balbasty/nitorch.
△ Less
Submitted 24 June, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Joint Total Variation ESTATICS for Robust Multi-Parameter Map**
Authors:
Yaël Balbastre,
Mikael Brudfors,
Michela Azzarito,
Christian Lambert,
Martina F. Callaghan,
John Ashburner
Abstract:
Quantitative magnetic resonance imaging (qMRI) derives tissue-specific parameters -- such as the apparent transverse relaxation rate R2*, the longitudinal relaxation rate R1 and the magnetisation transfer saturation -- that can be compared across sites and scanners and carry important information about the underlying microstructure. The multi-parameter map** (MPM) protocol takes advantage of mul…
▽ More
Quantitative magnetic resonance imaging (qMRI) derives tissue-specific parameters -- such as the apparent transverse relaxation rate R2*, the longitudinal relaxation rate R1 and the magnetisation transfer saturation -- that can be compared across sites and scanners and carry important information about the underlying microstructure. The multi-parameter map** (MPM) protocol takes advantage of multi-echo acquisitions with variable flip angles to extract these parameters in a clinically acceptable scan time. In this context, ESTATICS performs a joint loglinear fit of multiple echo series to extract R2* and multiple extrapolated intercepts, thereby improving robustness to motion and decreasing the variance of the estimators. In this paper, we extend this model in two ways: (1) by introducing a joint total variation (JTV) prior on the intercepts and decay, and (2) by deriving a nonlinear maximum \emph{a posteriori} estimate. We evaluated the proposed algorithm by predicting left-out echoes in a rich single-subject dataset. In this validation, we outperformed other state-of-the-art methods and additionally showed that the proposed approach greatly reduces the variance of the estimated maps, without introducing bias.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.