-
Develo** patient-driven artificial intelligence based on personal rankings of care decision making steps
Authors:
Lauri Lahti
Abstract:
We propose and experimentally motivate a new methodology to support decision-making processes in healthcare with artificial intelligence based on personal rankings of care decision making steps that can be identified with our methodology, questionnaire data and its statistical patterns. Our longitudinal quantitative cross-sectional three-stage study gathered self-ratings for 437 expression stateme…
▽ More
We propose and experimentally motivate a new methodology to support decision-making processes in healthcare with artificial intelligence based on personal rankings of care decision making steps that can be identified with our methodology, questionnaire data and its statistical patterns. Our longitudinal quantitative cross-sectional three-stage study gathered self-ratings for 437 expression statements concerning healthcare situations on Likert scales in respect to "the need for help", "the advancement of health", "the hopefulness", "the indication of compassion" and "the health condition", and 45 answers about the person's demographics, health and wellbeing, also the duration of giving answers. Online respondents between 1 June 2020 and 29 June 2021 were recruited from Finnish patient and disabled people's organizations, other health-related organizations and professionals, and educational institutions (n=1075). With Kruskal-Wallis test, Wilcoxon rank-sum test (i.e., Mann-Whitney U test), Wilcoxon rank-sum pairwise test, Welch's t test and one-way analysis of variance (ANOVA) between groups test we identified statistically significant differences of ratings and their durations for each expression statement in respect to respondent grou**s based on the answer values of each background question. Frequencies of the later reordering of rating rankings showed dependencies with ratings given earlier in respect to various interpretation task entities, interpretation dimensions and respondent grou**s. Our methodology, questionnaire data and its statistical patterns enable analyzing with self-rated expression statements the representations of decision making steps in healthcare situations and their chaining, agglomeration and branching in knowledge entities of personalized care paths. Our results support building artificial intelligence solutions to address the patient's needs concerning care.
△ Less
Submitted 15 May, 2022;
originally announced May 2022.
-
Detecting the patient's need for help with machine learning
Authors:
Lauri Lahti
Abstract:
Develo** machine learning models to support health analytics requires increased understanding about statistical properties of self-rated expression statements. We analyzed self-rated expression statements concerning the coronavirus COVID-19 epidemic to identify statistically significant differences between groups of respondents and to detect the patient's need for help with machine learning. Our…
▽ More
Develo** machine learning models to support health analytics requires increased understanding about statistical properties of self-rated expression statements. We analyzed self-rated expression statements concerning the coronavirus COVID-19 epidemic to identify statistically significant differences between groups of respondents and to detect the patient's need for help with machine learning. Our quantitative study gathered the "need for help" ratings for twenty health-related expression statements concerning the coronavirus epidemic on a 11-point Likert scale, and nine answers about the person's health and wellbeing, sex and age. Online respondents between 30 May and 3 August 2020 were recruited from Finnish patient and disabled people's organizations, other health-related organizations and professionals, and educational institutions (n=673). We analyzed rating differences and dependencies with Kendall rank-correlation and cosine similarity measures and tests of Wilcoxon rank-sum, Kruskal-Wallis and one-way analysis of variance (ANOVA) between groups, and carried out machine learning experiments with a basic implementation of a convolutional neural network algorithm. We found statistically significant correlations and high cosine similarity values between various health-related expression statement pairs concerning the "need for help" ratings and a background question pair. We also identified statistically significant rating differences for several health-related expression statements in respect to grou**s based on the answer values of background questions, such as the ratings of suspecting to have the coronavirus infection and having it depending on the estimated health condition, quality of life and sex. Our experiments with a convolutional neural network algorithm showed the applicability of machine learning to support detecting the need for help in the patient's expressions.
△ Less
Submitted 24 March, 2021; v1 submitted 25 December, 2020;
originally announced December 2020.
-
Development of computational models for emotional diary text analysis to support maternal care
Authors:
Lauri Lahti,
Henni Tenhunen,
Seppo Heinonen,
Minna Helkavaara,
Maritta Pöyhönen-Alho,
Paulus Torkki
Abstract:
We propose new computational models for analyzing self-reported emotional diary texts of pregnant women to support maternal care. We gathered affective ratings outside clinical setting and developed new models to facilitate interpretation and communication of affective expressions between persons representing different affective ratings. Relying on constructed emotion theory, models of dimensional…
▽ More
We propose new computational models for analyzing self-reported emotional diary texts of pregnant women to support maternal care. We gathered affective ratings outside clinical setting and developed new models to facilitate interpretation and communication of affective expressions between persons representing different affective ratings. Relying on constructed emotion theory, models of dimensional emotion categories and affective ratings of Self Assessment Manikin, we demonstrate our new proposal to analyze linguistic data with computational models exploiting vector space and clustering methods. 35 persons having Finnish as a native language provided affective ratings for 195 emotional adjectives and 16 pregnancy-related nouns in Finnish in dimensions of pleasure, arousal and dominance. We developed new models to represent dependencies and differences of affective ratings between various population subgroup categorizations, including "women without children", "women with children" and "men without children" that we consider important population segments to be addressed in maternal care. Our affective ratings showed significant correlations between pleasure and dominance (like Warriner et al., 2013) and with previous data collections (Söderholm et al., 2013; Eilola & Havelka, 2010; Warriner et al., 2013). Our affective ratings had significant effects on categorizations based on gender, gender-parental role and the time of the day and duration of giving ratings. Our results indicate accordance with significant affectivity differences of gender and age (Warriner et al., 2013) and motherhood (Rosebrock et al., 2015). Our proposed models aim to support health-related communication. Our results suggest gathering next the affective ratings of patients of maternal care in a real clinical setting.
△ Less
Submitted 17 October, 2017; v1 submitted 9 October, 2017;
originally announced October 2017.
-
Fully scalable online-preprocessing algorithm for short oligonucleotide microarray atlases
Authors:
Leo Lahti,
Aurora Torrente,
Laura L. Elo,
Alvis Brazma,
Johan Rung
Abstract:
Accumulation of standardized data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of contemporary microarray collections. While short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level pr…
▽ More
Accumulation of standardized data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of contemporary microarray collections. While short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level preprocessing algorithms have been available only for few measurement platforms based on pre-calculated model parameters from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm that provides tools to process large microarray atlases including tens of thousands of arrays. Unlike the alternatives, the proposed algorithm scales up in linear time with respect to sample size and is readily applicable to all short oligonucleotide platforms. This is the only available preprocessing algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small, consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray data collections. Moreover, using the most comprehensive data collections to estimate probe-level effects can assist in pinpointing individual probes affected by various biases and provide new tools to guide array design and quality control. The implementation is freely available in R/Bioconductor at http://www.bioconductor.org/packages/devel/bioc/html/RPA.html
△ Less
Submitted 27 December, 2012; v1 submitted 24 December, 2012;
originally announced December 2012.
-
Global modeling of transcriptional responses in interaction networks
Authors:
Leo Lahti,
Juha E. A. Knuuttila,
Samuel Kaski
Abstract:
Motivation: Cell-biological processes are regulated through a complex network of interactions between genes and their products. The processes, their activating conditions, and the associated transcriptional responses are often unknown. Organism-wide modeling of network activation can reveal unique and shared mechanisms between physiological conditions, and potentially as yet unknown processes. We…
▽ More
Motivation: Cell-biological processes are regulated through a complex network of interactions between genes and their products. The processes, their activating conditions, and the associated transcriptional responses are often unknown. Organism-wide modeling of network activation can reveal unique and shared mechanisms between physiological conditions, and potentially as yet unknown processes. We introduce a novel approach for organism-wide discovery and analysis of transcriptional responses in interaction networks. The method searches for local, connected regions in a network that exhibit coordinated transcriptional response in a subset of conditions. Known interactions between genes are used to limit the search space and to guide the analysis. Validation on a human pathway network reveals physiologically coherent responses, functional relatedness between physiological conditions, and coordinated, context-specific regulation of the genes. Availability: Implementation is freely available in R and Matlab at http://netpro.r-forge.r-project.org
△ Less
Submitted 2 February, 2012;
originally announced February 2012.
-
Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review
Authors:
Leo Lahti,
Martin Schäfer,
Hans-Ulrich Klein,
Silvio Bicciato,
Martin Dugas
Abstract:
A variety of genome-wide profiling techniques are available to probe complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher-level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the…
▽ More
A variety of genome-wide profiling techniques are available to probe complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher-level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we provide a comparison among various modeling procedures for integrating genome-wide profiling data of gene copy number and transcriptional alterations and highlight common approaches to genomic data integration. A transparent benchmarking procedure is introduced to quantitatively compare the cancer gene prioritization performance of the alternative methods. The benchmarking algorithms and data sets are available at http://intcomp.r-forge.r-project.org
△ Less
Submitted 20 November, 2011;
originally announced November 2011.
-
RPA: Probabilistic analysis of probe performance and robust summarization
Authors:
Leo Lahti,
Laura L. Elo,
Tero Aittokallio,
Samuel Kaski
Abstract:
Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood. Data-driven analysis of probe performance can be used to quantify the uncertainty in individual probes and to highlight the relative contribution of different noise sources. Improved understanding of the probe-level effects can lead to improved…
▽ More
Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood. Data-driven analysis of probe performance can be used to quantify the uncertainty in individual probes and to highlight the relative contribution of different noise sources. Improved understanding of the probe-level effects can lead to improved preprocessing techniques and microarray design.
We have implemented probabilistic tools for probe performance analysis and summarization on short oligonucleotide arrays. In contrast to standard preprocessing approaches, the methods provide quantitative estimates of probe-specific noise and affinity terms and tools to investigate these parameters. Tools to incorporate prior information of the probes in the analysis are provided as well. Comparisons to known probe-level error sources and spike-in data sets validate the approach.
Implementation is freely available in R/BioConductor: http://www.bioconductor.org/packages/release/bioc/html/RPA.html
△ Less
Submitted 6 April, 2013; v1 submitted 22 September, 2011;
originally announced September 2011.
-
A brief overview on the BioPAX and SBML standards for formal presentation of complex biological knowledge
Authors:
Leo Lahti
Abstract:
A brief informal overview on the BioPAX and SBML standards for formal presentation of complex biological knowledge.
A brief informal overview on the BioPAX and SBML standards for formal presentation of complex biological knowledge.
△ Less
Submitted 22 September, 2011;
originally announced September 2011.
-
Probabilistic analysis of the human transcriptome with side information
Authors:
Leo Lahti
Abstract:
Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, no…
▽ More
Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function.
A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.
△ Less
Submitted 27 February, 2011;
originally announced February 2011.