-
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset
Authors:
Mahta Bakhshizadeh,
Christian Jilek,
Markus Schröder,
Heiko Maus,
Andreas Dengel
Abstract:
Over the years, various approaches have been employed to enhance the productivity of knowledge workers, from addressing psychological well-being to the development of personal knowledge assistants. A significant challenge in this research area has been the absence of a comprehensive, publicly accessible dataset that mirrors real-world knowledge work. Although a handful of datasets exist, many are…
▽ More
Over the years, various approaches have been employed to enhance the productivity of knowledge workers, from addressing psychological well-being to the development of personal knowledge assistants. A significant challenge in this research area has been the absence of a comprehensive, publicly accessible dataset that mirrors real-world knowledge work. Although a handful of datasets exist, many are restricted in access or lack vital information dimensions, complicating meaningful comparison and benchmarking in the domain. This paper presents RLKWiC, a novel dataset of Real-Life Knowledge Work in Context, derived from monitoring the computer interactions of eight participants over a span of two months. As the first publicly available dataset offering a wealth of essential information dimensions (such as explicated contexts, textual contents, and semantics), RLKWiC seeks to address the research gap in the personal information management domain, providing valuable insights for modeling user behavior.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Towards Self-organizing Personal Knowledge Assistants in Evolving Corporate Memories
Authors:
Christian Jilek,
Markus Schröder,
Heiko Maus,
Sven Schwarz,
Andreas Dengel
Abstract:
This paper presents a retrospective overview of a decade of research in our department towards self-organizing personal knowledge assistants in evolving corporate memories. Our research is typically inspired by real-world problems and often conducted in interdisciplinary collaborations with research and industry partners. We summarize past experiments and results comprising topics like various way…
▽ More
This paper presents a retrospective overview of a decade of research in our department towards self-organizing personal knowledge assistants in evolving corporate memories. Our research is typically inspired by real-world problems and often conducted in interdisciplinary collaborations with research and industry partners. We summarize past experiments and results comprising topics like various ways of knowledge graph construction in corporate and personal settings, Managed Forgetting and (Self-organizing) Context Spaces as a novel approach to Personal Information Management (PIM) and knowledge work support. Past results are complemented by an overview of related work and some of our latest findings not published so far. Last, we give an overview of our related industry use cases including a detailed look into CoMem, a Corporate Memory based on our presented research already in productive use and providing challenges for further research. Many contributions are only first steps in new directions with still a lot of untapped potential, especially with regard to further increasing the automation in PIM and knowledge work support.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Spread2RML: Constructing Knowledge Graphs by Predicting RML Map**s on Messy Spreadsheets
Authors:
Markus Schröder,
Christian Jilek,
Andreas Dengel
Abstract:
The RDF Map** Language (RML) allows to map semi-structured data to RDF knowledge graphs. Besides CSV, JSON and XML, this also includes the map** of spreadsheet tables. Since spreadsheets have a complex data model and can become rather messy, their map** creation tends to be very time consuming. In order to reduce such efforts, this paper presents Spread2RML which predicts RML map**s on mes…
▽ More
The RDF Map** Language (RML) allows to map semi-structured data to RDF knowledge graphs. Besides CSV, JSON and XML, this also includes the map** of spreadsheet tables. Since spreadsheets have a complex data model and can become rather messy, their map** creation tends to be very time consuming. In order to reduce such efforts, this paper presents Spread2RML which predicts RML map**s on messy spreadsheets. This is done with an extensible set of RML object map templates which are applied for each column based on heuristics. In our evaluation, three datasets are used ranging from very messy synthetic data to spreadsheets from data.gov which are less messy. We obtained first promising results especially with regard to our approach being fully automatic and dealing with rather messy data.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
A Linked Data Application Framework to Enable Rapid Prototy**
Authors:
Markus Schröder,
Christian Jilek,
Andreas Dengel
Abstract:
Application developers, in our experience, tend to hesitate when dealing with linked data technologies. To reduce their initial hurdle and enable rapid prototy**, we propose in this paper a framework for building linked data applications. Our approach especially considers the participation of web developers and non-technical users without much prior knowledge about linked data concepts. Web deve…
▽ More
Application developers, in our experience, tend to hesitate when dealing with linked data technologies. To reduce their initial hurdle and enable rapid prototy**, we propose in this paper a framework for building linked data applications. Our approach especially considers the participation of web developers and non-technical users without much prior knowledge about linked data concepts. Web developers are supported with bidirectional RDF to JSON conversions and suitable CRUD endpoints. Non-technical users can browse websites generated from JSON data by means of a template language. A prototypical open source implementation demonstrates its capabilities.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Map** Spreadsheets to RDF: Supporting Excel in RML
Authors:
Markus Schröder,
Christian Jilek,
Andreas Dengel
Abstract:
The RDF Map** Language (RML) enables, among other formats, the map** of tabular data as Comma-Separated Values (CSV) files to RDF graphs. Unfortunately, the widely used spreadsheet format is currently neglected by its specification and well-known implementations. Therefore, we extended one of the tools which is RML Mapper to support Microsoft Excel spreadsheet files and demonstrate its capabil…
▽ More
The RDF Map** Language (RML) enables, among other formats, the map** of tabular data as Comma-Separated Values (CSV) files to RDF graphs. Unfortunately, the widely used spreadsheet format is currently neglected by its specification and well-known implementations. Therefore, we extended one of the tools which is RML Mapper to support Microsoft Excel spreadsheet files and demonstrate its capabilities in an interactive online demo. Our approach allows to access various meta data of spreadsheet cells in typical RML maps. Some experimental features for more specific use cases are also provided. The implementation code is publicly available in a GitHub fork.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Dataset Generation Patterns for Evaluating Knowledge Graph Construction
Authors:
Markus Schröder,
Christian Jilek,
Andreas Dengel
Abstract:
Confidentiality hinders the publication of authentic, labeled datasets of personal and enterprise data, although they could be useful for evaluating knowledge graph construction approaches in industrial scenarios. Therefore, our plan is to synthetically generate such data in a way that it appears as authentic as possible. Based on our assumption that knowledge workers have certain habits when they…
▽ More
Confidentiality hinders the publication of authentic, labeled datasets of personal and enterprise data, although they could be useful for evaluating knowledge graph construction approaches in industrial scenarios. Therefore, our plan is to synthetically generate such data in a way that it appears as authentic as possible. Based on our assumption that knowledge workers have certain habits when they produce or manage data, generation patterns could be discovered which can be utilized by data generators to imitate real datasets. In this paper, we initially derived 11 distinct patterns found in real spreadsheets from industry and demonstrate a suitable generator called Data Sprout that is able to reproduce them. We describe how the generator produces spreadsheets in general and what altering effects the implemented patterns have.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Interactively Constructing Knowledge Graphs from Messy User-Generated Spreadsheets
Authors:
Markus Schröder,
Christian Jilek,
Michael Schulze,
Andreas Dengel
Abstract:
When spreadsheets are filled freely by knowledge workers, they can contain rather unstructured content. For humans and especially machines it becomes difficult to interpret such data properly. Therefore, spreadsheets are often converted to a more explicit, formal and structured form, for example, to a knowledge graph. However, if a data maintenance strategy has been missing and user-generated data…
▽ More
When spreadsheets are filled freely by knowledge workers, they can contain rather unstructured content. For humans and especially machines it becomes difficult to interpret such data properly. Therefore, spreadsheets are often converted to a more explicit, formal and structured form, for example, to a knowledge graph. However, if a data maintenance strategy has been missing and user-generated data becomes "messy", the construction of knowledge graphs will be a challenging task. In this paper, we catalog several of those challenges and propose an interactive approach to solve them. Our approach includes a graphical user interface which enables knowledge engineers to bulk-annotate spreadsheet cells with extracted information. Based on the cells' annotations a knowledge graph is ultimately formed. Using five spreadsheets from an industrial scenario, we built a 25k-triple graph during our evaluation. We compared our method with the state-of-the-art RDF Map** Language (RML) attempt. The comparison highlights contributions of our approach.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
The Person Index Challenge: Extraction of Persons from Messy, Short Texts
Authors:
Markus Schröder,
Christian Jilek,
Michael Schulze,
Andreas Dengel
Abstract:
When persons are mentioned in texts with their first name, last name and/or middle names, there can be a high variation which of their names are used, how their names are ordered and if their names are abbreviated. If multiple persons are mentioned consecutively in very different ways, especially short texts can be perceived as "messy". Once ambiguous names occur, associations to persons may not b…
▽ More
When persons are mentioned in texts with their first name, last name and/or middle names, there can be a high variation which of their names are used, how their names are ordered and if their names are abbreviated. If multiple persons are mentioned consecutively in very different ways, especially short texts can be perceived as "messy". Once ambiguous names occur, associations to persons may not be inferred correctly. Despite these eventualities, in this paper we ask how well an unsupervised algorithm can build a person index from short texts. We define a person index as a structured table that distinctly catalogs individuals by their names. First, we give a formal definition of the problem and describe a procedure to generate ground truth data for future evaluations. To give a first solution to this challenge, a baseline approach is implemented. By using our proposed evaluation strategy, we test the performance of the baseline and suggest further improvements. For future research the source code is publicly available.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Bridging the Technology Gap Between Industry and Semantic Web: Generating Databases and Server Code From RDF
Authors:
Markus Schröder,
Michael Schulze,
Christian Jilek,
Andreas Dengel
Abstract:
Despite great advances in the area of Semantic Web, industry rather seldom adopts Semantic Web technologies and their storage and query concepts. Instead, relational databases (RDB) are often deployed to store business-critical data, which are accessed via REST interfaces. Yet, some enterprises would greatly benefit from Semantic Web related datasets which are usually represented with the Resource…
▽ More
Despite great advances in the area of Semantic Web, industry rather seldom adopts Semantic Web technologies and their storage and query concepts. Instead, relational databases (RDB) are often deployed to store business-critical data, which are accessed via REST interfaces. Yet, some enterprises would greatly benefit from Semantic Web related datasets which are usually represented with the Resource Description Framework (RDF). To bridge this technology gap, we propose a fully automatic approach that generates suitable RDB models with REST APIs to access them. In our evaluation, generated databases from different RDF datasets are examined and compared. Our findings show that the databases sufficiently reflect their counterparts while the API is able to reproduce rather simple SPARQL queries. Potentials for improvements are identified, for example, the reduction of data redundancies in generated databases.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Temporarily Unavailable: Memory Inhibition in Cognitive and Computer Science
Authors:
Tobias Tempel,
Claudia Niederée,
Christian Jilek,
Andrea Ceroni,
Heiko Maus,
Yannick Runge,
Christian Frings
Abstract:
Inhibition is one of the core concepts in Cognitive Psychology. The idea of inhibitory mechanisms actively weakening representations in the human mind has inspired a great number of studies in various research domains. In contrast, Computer Science only recently has begun to consider inhibition as a second basic processing quality beside activation. Here, we review psychological research on inhibi…
▽ More
Inhibition is one of the core concepts in Cognitive Psychology. The idea of inhibitory mechanisms actively weakening representations in the human mind has inspired a great number of studies in various research domains. In contrast, Computer Science only recently has begun to consider inhibition as a second basic processing quality beside activation. Here, we review psychological research on inhibition in memory and link the gained insights with the current efforts in Computer Science of incorporating inhibitory principles for optimizing information retrieval in Personal Information Management. Four common aspects guide this review in both domains: 1. The purpose of inhibition to increase processing efficiency. 2. Its relation to activation. 3. Its links to contexts. 4. Its temporariness. In summary, the concept of inhibition has been used by Computer Science for enhancing software in various ways already. Yet, we also identify areas for promising future developments of inhibitory mechanisms, particularly context inhibition.
△ Less
Submitted 15 November, 2019;
originally announced December 2019.
-
Interactive Concept Mining on Personal Data -- Bootstrap** Semantic Services
Authors:
Markus Schröder,
Christian Jilek,
Andreas Dengel
Abstract:
Semantic services (e.g. Semantic Desktops) are still afflicted by a cold start problem: in the beginning, the user's personal information sphere, i.e. files, mails, bookmarks, etc., is not represented by the system. Information extraction tools used to kick-start the system typically create 1:1 representations of the different information items. Higher level concepts, for example found in file nam…
▽ More
Semantic services (e.g. Semantic Desktops) are still afflicted by a cold start problem: in the beginning, the user's personal information sphere, i.e. files, mails, bookmarks, etc., is not represented by the system. Information extraction tools used to kick-start the system typically create 1:1 representations of the different information items. Higher level concepts, for example found in file names, mail subjects or in the content body of these items, are not extracted. Leaving these concepts out may lead to underperformance, having to many of them (e.g. by making every found term a concept) will clutter the arising knowledge graph with non-helpful relations. In this paper, we present an interactive concept mining approach proposing concept candidates gathered by exploiting given schemata of usual personal information management applications and analysing the personal information sphere using various metrics. To heed the subjective view of the user, a graphical user interface allows to easily rank and give feedback on proposed concept candidates, thus kee** only those actually considered relevant. A prototypical implementation demonstrates major steps of our approach.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
Inflection-Tolerant Ontology-Based Named Entity Recognition for Real-Time Applications
Authors:
Christian Jilek,
Markus Schröder,
Rudolf Novik,
Sven Schwarz,
Heiko Maus,
Andreas Dengel
Abstract:
A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems -- just to name a few. To perform the services desired by the user, these systems have to analyze user activity logs or explicit user input extremely fast. In particular, text content (e.g. in form of text snippets) needs to be processed in an…
▽ More
A growing number of applications users daily interact with have to operate in (near) real-time: chatbots, digital companions, knowledge work support systems -- just to name a few. To perform the services desired by the user, these systems have to analyze user activity logs or explicit user input extremely fast. In particular, text content (e.g. in form of text snippets) needs to be processed in an information extraction task. Regarding the aforementioned temporal requirements, this has to be accomplished in just a few milliseconds, which limits the number of methods that can be applied. Practically, only very fast methods remain, which on the other hand deliver worse results than slower but more sophisticated Natural Language Processing (NLP) pipelines. In this paper, we investigate and propose methods for real-time capable Named Entity Recognition (NER). As a first improvement step we address are word variations induced by inflection, for example present in the German language. Our approach is ontology-based and makes use of several language information sources like Wiktionary. We evaluated it using the German Wikipedia (about 9.4B characters), for which the whole NER process took considerably less than an hour. Since precision and recall are higher than with comparably fast methods, we conclude that the quality gap between high speed methods and sophisticated NLP pipelines can be narrowed a bit more without losing too much runtime performance.
△ Less
Submitted 5 December, 2018;
originally announced December 2018.
-
Advanced Memory Buoyancy for Forgetful Information Systems
Authors:
Christian Jilek,
Jessica Chwalek,
Sven Schwarz,
Markus Schröder,
Heiko Maus,
Andreas Dengel
Abstract:
Knowledge workers face an ever increasing flood of information in their daily lives. To counter this and provide better support for information management and knowledge work in general, we have been investigating solutions inspired by human forgetting since 2013. These solutions are based on Semantic Desktop (SD) and Managed Forgetting (MF) technology. A key concept of the latter is the so-called…
▽ More
Knowledge workers face an ever increasing flood of information in their daily lives. To counter this and provide better support for information management and knowledge work in general, we have been investigating solutions inspired by human forgetting since 2013. These solutions are based on Semantic Desktop (SD) and Managed Forgetting (MF) technology. A key concept of the latter is the so-called Memory Buoyancy (MB), which is intended to represent an information item's current value for the user and allows to employ forgetting mechanisms. The SD thus continuously performs information value assessment updating MB and triggering respective MF measures. We extended an SD-based organizational memory system, which we have been using in daily work for over seven years now, with MF mechanisms directly embedding them in daily activities, too, and enabling us to test and optimize them in real-world scenarios. In this paper, we first present our initial version of MB and discuss success and failure stories we have been experiencing with it during three years of practical usage. We learned from cognitive psychology that our previous research on context can be beneficial for MF. Thus, we created an advanced MB version especially taking user context, and in particular context switches, into account. These enhancements as well as a first prototypical implementation are presented, too.
△ Less
Submitted 17 November, 2018;
originally announced November 2018.
-
Managed Forgetting to Support Information Management and Knowledge Work
Authors:
Christian Jilek,
Yannick Runge,
Claudia Niederée,
Heiko Maus,
Tobias Tempel,
Andreas Dengel,
Christian Frings
Abstract:
Trends like digital transformation even intensify the already overwhelming mass of information knowledge workers face in their daily life. To counter this, we have been investigating knowledge work and information management support measures inspired by human forgetting. In this paper, we give an overview of solutions we have found during the last five years as well as challenges that still need t…
▽ More
Trends like digital transformation even intensify the already overwhelming mass of information knowledge workers face in their daily life. To counter this, we have been investigating knowledge work and information management support measures inspired by human forgetting. In this paper, we give an overview of solutions we have found during the last five years as well as challenges that still need to be tackled. Additionally, we share experiences gained with the prototype of a first forgetful information system used 24/7 in our daily work for the last three years. We also address the untapped potential of more explicated user context as well as features inspired by Memory Inhibition, which is our current focus of research.
△ Less
Submitted 17 November, 2018;
originally announced November 2018.
-
Towards Semantically Enhanced Data Understanding
Authors:
Markus Schröder,
Christian Jilek,
Jörn Hees,
Andreas Dengel
Abstract:
In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning of the data. Usually, documentation is separate from the data in various external documents, diagrams, spreadsheets and tools which causes considerable look up…
▽ More
In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning of the data. Usually, documentation is separate from the data in various external documents, diagrams, spreadsheets and tools which causes considerable look up overhead. Moreover, other supporting applications are not able to consume and utilize such unstructured data. That is why we propose a methodology that uses a single semantic model that interlinks data with its documentation. Hence, data scientists are able to directly look up the connected information about the data by simply following links. Equally, they can browse the documentation which always refers to the data. Furthermore, the model can be used by other approaches providing additional support, like searching, comparing, integrating or visualizing data. To showcase our approach we also demonstrate an early prototype.
△ Less
Submitted 13 June, 2018;
originally announced June 2018.
-
Deep Linking Desktop Resources
Authors:
Markus Schröder,
Christian Jilek,
Andreas Dengel
Abstract:
Deep Linking is the process of referring to a specific piece of web content. Although users can browse their files in desktop environments, they are unable to directly traverse deeper into their content using deep links. In order to solve this issue, we demonstrate "DeepLinker", a tool which generates and interprets deep links to desktop resources, thus enabling the reference to a certain location…
▽ More
Deep Linking is the process of referring to a specific piece of web content. Although users can browse their files in desktop environments, they are unable to directly traverse deeper into their content using deep links. In order to solve this issue, we demonstrate "DeepLinker", a tool which generates and interprets deep links to desktop resources, thus enabling the reference to a certain location within a file using a simple hyperlink. By default, the service responds with an HTML representation of the resource along with further links to follow. Additionally, we allow the use of RDF to interlink our deep links with other resources.
△ Less
Submitted 3 May, 2018;
originally announced May 2018.
-
Context Spaces as the Cornerstone of a Near-Transparent & Self-Reorganizing Semantic Desktop
Authors:
Christian Jilek,
Markus Schröder,
Sven Schwarz,
Heiko Maus,
Andreas Dengel
Abstract:
Existing Semantic Desktops are still reproached for being too complicated to use or not scaling well. Besides, a real "killer app" is still missing. In this paper, we present a new prototype inspired by NEPOMUK and its successors having a semantic graph and ontologies as its basis. In addition, we introduce the idea of context spaces that users can directly interact with and work on. To make them…
▽ More
Existing Semantic Desktops are still reproached for being too complicated to use or not scaling well. Besides, a real "killer app" is still missing. In this paper, we present a new prototype inspired by NEPOMUK and its successors having a semantic graph and ontologies as its basis. In addition, we introduce the idea of context spaces that users can directly interact with and work on. To make them available in all applications without further ado, the system is transparently integrated using mostly standard protocols complemented by a sidebar for advanced features. By exploiting collected context information and applying Managed Forgetting features (like hiding, condensation or deletion), the system is able to dynamically reorganize itself, which also includes a kind of tidy-up-itself functionality. We therefore expect it to be more scalable while providing new levels of user support. An early prototype has been implemented and is presented in this demo.
△ Less
Submitted 6 May, 2018;
originally announced May 2018.
-
An Easy & Collaborative RDF Data Entry Method using the Spreadsheet Metaphor
Authors:
Markus Schröder,
Christian Jilek,
Jörn Hees,
Sven Hertling,
Andreas Dengel
Abstract:
Spreadsheets are widely used by knowledge workers, especially in the industrial sector. Their methodology enables a well understood, easy and fast possibility to enter data. As filling out a spreadsheet is more accessible to common knowledge workers than defining RDF statements, in this paper, we propose an easy-to-use, zero-configuration, web-based spreadsheet editor that simultaneously transfers…
▽ More
Spreadsheets are widely used by knowledge workers, especially in the industrial sector. Their methodology enables a well understood, easy and fast possibility to enter data. As filling out a spreadsheet is more accessible to common knowledge workers than defining RDF statements, in this paper, we propose an easy-to-use, zero-configuration, web-based spreadsheet editor that simultaneously transfers spreadsheet entries into RDF statements. It enables various kinds of users to easily create semantic data whether they are RDF experts or novices. The typical scenario we address focuses on creating instance data starting with an empty knowledge base that is filled incrementally. In a user study, participants were able to create more statements in shorter time, having similar or even significantly outperforming quality, compared to other approaches.
△ Less
Submitted 11 April, 2018;
originally announced April 2018.