Search | arXiv e-print repository

Putting Language into Context Using Smartphone-Based Keyboard Logging

Authors: Florian Bemmann, Timo Koch, Maximilian Bergmann, Clemens Stachl, Daniel Buschek, Ramona Schoedel, Sven Mayer

Abstract: While the study of language as typed on smartphones offers valuable insights, existing data collection methods often fall short in providing contextual information and ensuring user privacy. We present a privacy-respectful approach - context-enriched keyboard logging - that allows for the extraction of contextual information on the user's input motive, which is meaningful for linguistics, psycholo… ▽ More While the study of language as typed on smartphones offers valuable insights, existing data collection methods often fall short in providing contextual information and ensuring user privacy. We present a privacy-respectful approach - context-enriched keyboard logging - that allows for the extraction of contextual information on the user's input motive, which is meaningful for linguistics, psychology, and behavioral sciences. In particular, with our approach, we enable distinguishing language contents by their channel (i.e., comments, messaging, search inputs). Filtering by channel allows for better pre-selection of data, which is in the interest of researchers and improves users' privacy. We demonstrate our approach on a large-scale six-month user study (N=624) of language use in smartphone interactions in the wild. Finally, we highlight the implications for research on language use in human-computer interaction and interdisciplinary contexts. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2301.04993 [pdf, other]

Against Algorithmic Exploitation of Human Vulnerabilities

Authors: Inga Strümke, Marija Slavkovik, Clemens Stachl

Abstract: Decisions such as which movie to watch next, which song to listen to, or which product to buy online, are increasingly influenced by recommender systems and user models that incorporate information on users' past behaviours, preferences, and digitally created content. Machine learning models that enable recommendations and that are trained on user data may unintentionally leverage information on h… ▽ More Decisions such as which movie to watch next, which song to listen to, or which product to buy online, are increasingly influenced by recommender systems and user models that incorporate information on users' past behaviours, preferences, and digitally created content. Machine learning models that enable recommendations and that are trained on user data may unintentionally leverage information on human characteristics that are considered vulnerabilities, such as depression, young age, or gambling addiction. The use of algorithmic decisions based on latent vulnerable state representations could be considered manipulative and could have a deteriorating impact on the condition of vulnerable individuals. In this paper, we are concerned with the problem of machine learning models inadvertently modelling vulnerabilities, and want to raise awareness for this issue to be considered in legislation and AI ethics. Hence, we define and describe common vulnerabilities, and illustrate cases where they are likely to play a role in algorithmic decision-making. We propose a set of requirements for methods to detect the potential for vulnerability modelling, detect whether vulnerable groups are treated differently by a model, and detect whether a model has created an internal representation of vulnerability. We conclude that explainable artificial intelligence methods may be necessary for detecting vulnerability exploitation by machine learning-based recommendation systems. △ Less

Submitted 12 January, 2023; originally announced January 2023.

Comments: arXiv admin note: text overlap with arXiv:2203.00317

arXiv:2205.13080 [pdf, other]

Factorized Structured Regression for Large-Scale Varying Coefficient Models

Authors: David Rügamer, Andreas Bender, Simon Wiegrebe, Daniel Racek, Bernd Bischl, Christian L. Müller, Clemens Stachl

Abstract: Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structur… ▽ More Recommender Systems (RS) pervade many aspects of our everyday digital life. Proposed to work at scale, state-of-the-art RS allow the modeling of thousands of interactions and facilitate highly individualized recommendations. Conceptually, many RS can be viewed as instances of statistical regression models that incorporate complex feature effects and potentially non-Gaussian outcomes. Such structured regression models, including time-aware varying coefficients models, are, however, limited in their applicability to categorical effects and inclusion of a large number of interactions. Here, we propose Factorized Structured Regression (FaStR) for scalable varying coefficient models. FaStR overcomes limitations of general regression models for large-scale data by combining structured additive regression and factorization approaches in a neural network-based model implementation. This fusion provides a scalable framework for the estimation of statistical models in previously infeasible data settings. Empirical results confirm that the estimation of varying coefficients of our approach is on par with state-of-the-art regression techniques, while scaling notably better and also being competitive with other time-aware RS in terms of prediction performance. We illustrate FaStR's performance and interpretability on a large-scale behavioral study with smartphone user data. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2105.02738 [pdf, other]

doi 10.1145/3461702.3462626

Digital Voodoo Dolls

Authors: Marija Slavkovik, Clemens Stachl, Caroline Pitman, Jonathan Askonas

Abstract: An institution, be it a body of government, commercial enterprise, or a service, cannot interact directly with a person. Instead, a model is created to represent us. We argue the existence of a new high-fidelity type of person model which we call a digital voodoo doll. We conceptualize it and compare its features with existing models of persons. Digital voodoo dolls are distinguished by existing c… ▽ More An institution, be it a body of government, commercial enterprise, or a service, cannot interact directly with a person. Instead, a model is created to represent us. We argue the existence of a new high-fidelity type of person model which we call a digital voodoo doll. We conceptualize it and compare its features with existing models of persons. Digital voodoo dolls are distinguished by existing completely beyond the influence and control of the person they represent. We discuss the ethical issues that such a lack of accountability creates and argue how these concerns can be mitigated. △ Less

Submitted 7 May, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: Accepted for publication at Artificial Intelligence, Ethics and Society (AIES-2021)

arXiv:2104.11688 [pdf, other]

doi 10.1007/s10618-022-00840-5

Grouped Feature Importance and Combined Features Effect Plot

Authors: Quay Au, Julia Herbinger, Clemens Stachl, Bernd Bischl, Giuseppe Casalicchio

Abstract: Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effec… ▽ More Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and a real data example from computational psychology to analyze, compare, and discuss these methods. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Journal ref: Data Mining and Knowledge Discovery 36, 1401--1450 (2022)

arXiv:2003.06186 [pdf, other]

doi 10.1145/3313831.3376210

Develo** a Personality Model for Speech-based Conversational Agents Using the Psycholexical Approach

Authors: Sarah Theres Völkel, Ramona Schödel, Daniel Buschek, Clemens Stachl, Verena Winterhalter, Markus Bühner, Heinrich Hussmann

Abstract: We present the first systematic analysis of personality dimensions developed specifically to describe the personality of speech-based conversational agents. Following the psycholexical approach from psychology, we first report on a new multi-method approach to collect potentially descriptive adjectives from 1) a free description task in an online survey (228 unique descriptors), 2) an interaction… ▽ More We present the first systematic analysis of personality dimensions developed specifically to describe the personality of speech-based conversational agents. Following the psycholexical approach from psychology, we first report on a new multi-method approach to collect potentially descriptive adjectives from 1) a free description task in an online survey (228 unique descriptors), 2) an interaction task in the lab (176 unique descriptors), and 3) a text analysis of 30,000 online reviews of conversational agents (Alexa, Google Assistant, Cortana) (383 unique descriptors). We aggregate the results into a set of 349 adjectives, which are then rated by 744 people in an online survey. A factor analysis reveals that the commonly used Big Five model for human personality does not adequately describe agent personality. As an initial step to develo** a personality model, we propose alternative dimensions and discuss implications for the design of agent personalities, personality-aware personalisation, and future research. △ Less

Submitted 13 March, 2020; originally announced March 2020.

Comments: 14 pages, 2 figures, 3 tables, CHI'20

ACM Class: H.5.2

arXiv:1904.03943 [pdf, other]

Component-Wise Boosting of Targets for Multi-Output Prediction

Authors: Quay Au, Daniel Schalk, Giuseppe Casalicchio, Ramona Schoedel, Clemens Stachl, Bernd Bischl

Abstract: Multi-output prediction deals with the prediction of several targets of possibly diverse types. One way to address this problem is the so called problem transformation method. This method is often used in multi-label learning, but can also be used for multi-output prediction due to its generality and simplicity. In this paper, we introduce an algorithm that uses the problem transformation method f… ▽ More Multi-output prediction deals with the prediction of several targets of possibly diverse types. One way to address this problem is the so called problem transformation method. This method is often used in multi-label learning, but can also be used for multi-output prediction due to its generality and simplicity. In this paper, we introduce an algorithm that uses the problem transformation method for multi-output prediction, while simultaneously learning the dependencies between target variables in a sparse and interpretable manner. In a first step, predictions are obtained for each target individually. Target dependencies are then learned via a component-wise boosting approach. We compare our new method with similar approaches in a benchmark using multi-label, multivariate regression and mixed-type datasets. △ Less

Submitted 8 April, 2019; originally announced April 2019.

Showing 1–7 of 7 results for author: Stachl, C