Search | arXiv e-print repository

DataDock: An Open Source Data Hub for Research

Authors: Lexington Whalen, Homayoun Valafar

Abstract: Every research project necessitates data, often requiring sharing and collaborative review within a team. However, there is a dearth of good open-source data sharing and reviewing services. Existing file-sharing services generally mandate paid subscriptions for increased storage or additional members, diverting research funds from addressing the core research problem that a lab is attempting to wo… ▽ More Every research project necessitates data, often requiring sharing and collaborative review within a team. However, there is a dearth of good open-source data sharing and reviewing services. Existing file-sharing services generally mandate paid subscriptions for increased storage or additional members, diverting research funds from addressing the core research problem that a lab is attempting to work on. Moreover, these services often lack direct features for reviewing or commenting on data quality, a vital part of ensuring high quality data generation. In response to these challenges, we present DataDock, a specialized file transfer service crafted for specifically for researchers. DataDock operates as an application hosted on a research lab server. This design ensures that, with access to a machine and an internet connection, teams can facilitate file storage, transfer, and review without incurring extra costs. Being an open-source project, DataDock can be customized to suit the unique requirements of any research team, and is able to evolve to meet the needs of the research community. We also note that there are no limitations with respect to what data can be shared, downloaded, or commented on. As DataDock is agnostic to the file type, it can be used in any field from bioinformatics to particle physics; as long as it can be stored in a file, it can be shared. We open source the code here: https://github.com/lxaw/DataDock △ Less

Submitted 26 June, 2024; v1 submitted 14 April, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures, submitted and in review at The 2024 World Congress in Computer Science, Computer Engineering, And Applied Computing (CSCE)

ACM Class: D.0; E.m

arXiv:2309.12981 [pdf]

Wordification: A New Way of Teaching English Spelling Patterns

Authors: Lexington Whalen, Nathan Bickel, Shash Comandur, Dalton Craven, Stanley Dubinsky, Homayoun Valafar

Abstract: Literacy, or the ability to read and write, is a crucial indicator of success in life and greater society. It is estimated that 85% of people in juvenile delinquent systems cannot adequately read or write, that more than half of those with substance abuse issues have complications in reading or writing and that two-thirds of those who do not complete high school lack proper literacy skills. Furthe… ▽ More Literacy, or the ability to read and write, is a crucial indicator of success in life and greater society. It is estimated that 85% of people in juvenile delinquent systems cannot adequately read or write, that more than half of those with substance abuse issues have complications in reading or writing and that two-thirds of those who do not complete high school lack proper literacy skills. Furthermore, young children who do not possess reading skills matching grade level by the fourth grade are approximately 80% likely to not catch up at all. Many may believe that in a developed country such as the United States, literacy fails to be an issue; however, this is a dangerous misunderstanding. Globally an estimated 1.19 trillion dollars are lost every year due to issues in literacy; in the USA, the loss is an estimated 300 billion. To put it in more shocking terms, one in five American adults still fail to comprehend basic sentences. Making matters worse, the only tools available now to correct a lack of reading and writing ability are found in expensive tutoring or other programs that oftentimes fail to be able to reach the required audience. In this paper, our team puts forward a new way of teaching English spelling and word recognitions to grade school students in the United States: Wordification. Wordification is a web application designed to teach English literacy using principles of linguistics applied to the orthographic and phonological properties of words in a manner not fully utilized previously in any computer-based teaching application. △ Less

Submitted 10 November, 2023; v1 submitted 29 August, 2023; originally announced September 2023.

Comments: 8 pages, 7 figures, IEEE International Conference on Frontiers in Education

ACM Class: K.3

arXiv:2301.10649 [pdf]

On Creating a Comprehensive Food Database

Authors: Lexington Whalen, Brie Turner-McGrievy, Matthew McGrievy, Andrew Hester, Homayoun Valafar

Abstract: Studies with the primary aim of addressing eating disorders focus on assessing the nutrient content of food items with an exclusive focus on caloric intake. There are two primary impediments that can be noted in these studies. The first of these relates to the fact that caloric intake of each food item is calculated from an existing database. The second concerns the scientific significance of calo… ▽ More Studies with the primary aim of addressing eating disorders focus on assessing the nutrient content of food items with an exclusive focus on caloric intake. There are two primary impediments that can be noted in these studies. The first of these relates to the fact that caloric intake of each food item is calculated from an existing database. The second concerns the scientific significance of caloric intake used as the single measure of nutrient content. By requiring an existing database, researchers are forced to find some source of a comprehensive set of food items as well as their respective nutrients. This search alone is a difficult task, and if completed often leads to the requirement of a paid API service. These services are expensive and non-customizable, taking away funding that could be aimed at other parts of the study only to give an unwieldy database that can not be modified or contributed to. In this work, we introduce a new rendition of the USDA's food database that includes both foods found in grocery stores and those found in restaurants or fast food places. At the moment, we have accumulated roughly 1.5 million food entries consisting of approximately 18,000 brands and 100 restaurants in the United States. These foods also have an abundance of nutrient data associated with them, from the caloric amount to saturated fat levels. The data is stored in MySQL format and is spread among five major tables. We have also procured images for theses foods entries when available, and have included all of our data and program scripts in an open source repository. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 6 pages, 9 figures, to be published in 2022 International Conference on Computational Science and Computational Intelligence (CSCI)

ACM Class: H.2.8; H.2.1

Showing 1–3 of 3 results for author: Whalen, L