Search | arXiv e-print repository

Open Domain Knowledge Extraction for Knowledge Graphs

Authors: Kun Qian, Anton Belyi, Fei Wu, Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind, Chloe Seivwright, Yiwen Sun, Ahmed Fakhry, Theo Rekatsinas, Ihab Ilyas, Xiaoguang Qi, Yunyao Li

Abstract: The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope… ▽ More The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph. △ Less

Submitted 30 October, 2023; originally announced December 2023.

Comments: 7 pages, 7 figures, 5 tables, preprint technical report, no code or data is released

MSC Class: 68T30 (primary) ACM Class: F.4.1; I.2.4

arXiv:2310.03611 [pdf]

GENER: A Parallel Layer Deep Learning Network To Detect Gene-Gene Interactions From Gene Expression Data

Authors: Ahmed Fakhry, Raneem Khafagy, Adriaan-Alexander Ludl

Abstract: Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclus… ▽ More Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclusively on utilizing gene expression profiles. In this context, we introduce GENER, a parallel-layer deep learning network designed exclusively for the identification of gene-gene relationships using gene expression data. We conducted two training experiments and compared the performance of our network with that of existing statistical and deep learning approaches. Notably, our model achieved an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting gene-gene interactions. △ Less

Submitted 6 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

arXiv:2103.01806 [pdf]

Virufy: A Multi-Branch Deep Learning Network for Automated Detection of COVID-19

Authors: Ahmed Fakhry, Xinyi Jiang, Jaclyn Xiao, Gunvant Chaudhari, Asriel Han, Amil Khanzada

Abstract: Fast and affordable solutions for COVID-19 testing are necessary to contain the spread of the global pandemic and help relieve the burden on medical facilities. Currently, limited testing locations and expensive equipment pose difficulties for individuals trying to be tested, especially in low-resource settings. Researchers have successfully presented models for detecting COVID-19 infection status… ▽ More Fast and affordable solutions for COVID-19 testing are necessary to contain the spread of the global pandemic and help relieve the burden on medical facilities. Currently, limited testing locations and expensive equipment pose difficulties for individuals trying to be tested, especially in low-resource settings. Researchers have successfully presented models for detecting COVID-19 infection status using audio samples recorded in clinical settings [5, 15], suggesting that audio-based Artificial Intelligence models can be used to identify COVID-19. Such models have the potential to be deployed on smartphones for fast, widespread, and low-resource testing. However, while previous studies have trained models on cleaned audio samples collected mainly from clinical settings, audio samples collected from average smartphones may yield suboptimal quality data that is different from the clean data that models were trained on. This discrepancy may add a bias that affects COVID-19 status predictions. To tackle this issue, we propose a multi-branch deep learning network that is trained and tested on crowdsourced data where most of the data has not been manually processed and cleaned. Furthermore, the model achieves state-of-art results for the COUGHVID dataset [16]. After breaking down results for each category, we have shown an AUC of 0.99 for audio samples with COVID-19 positive labels. △ Less

Submitted 16 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2011.13320 [pdf]

Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough

Authors: Gunvant Chaudhari, Xinyi Jiang, Ahmed Fakhry, Asriel Han, Jaclyn Xiao, Sabrina Shen, Amil Khanzada

Abstract: Rapid and affordable methods of testing for COVID-19 infections are essential to reduce infection rates and prevent medical facilities from becoming overwhelmed. Current approaches of detecting COVID-19 require in-person testing with expensive kits that are not always easily accessible. This study demonstrates that crowdsourced cough audio samples recorded and acquired on smartphones from around t… ▽ More Rapid and affordable methods of testing for COVID-19 infections are essential to reduce infection rates and prevent medical facilities from becoming overwhelmed. Current approaches of detecting COVID-19 require in-person testing with expensive kits that are not always easily accessible. This study demonstrates that crowdsourced cough audio samples recorded and acquired on smartphones from around the world can be used to develop an AI-based method that accurately predicts COVID-19 infection with an ROC-AUC of 77.1% (75.2%-78.3%). Furthermore, we show that our method is able to generalize to crowdsourced audio samples from Latin America and clinical samples from South Asia, without further training using the specific samples from those regions. As more crowdsourced data is collected, further development can be implemented using various respiratory audio samples to create a cough analysis-based machine learning (ML) solution for COVID-19 detection that can likely generalize globally to all demographic groups in both clinical and non-clinical settings. △ Less

Submitted 9 January, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

Showing 1–4 of 4 results for author: Fakhry, A