Search | arXiv e-print repository

SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples

Authors: Michael Färber, David Lamprecht, Johan Krause, Linn Aung, Peter Haase

Abstract: We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source… ▽ More We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources. Moreover, we provide embeddings for knowledge graph entities using high-performance computing. SemOpenAlex enables a broad range of use-case scenarios, such as exploratory semantic search via our website, large-scale scientific impact quantification, and other forms of scholarly big data analytics within and across scientific disciplines. Additionally, it enables academic recommender systems, such as recommending collaborators, publications, and venues, including explainability capabilities. Finally, SemOpenAlex can serve for RDF query optimization benchmarks, creating scholarly knowledge-guided language models, and as a hub for semantic scientific publishing. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: accepted at ISWC'23

arXiv:2305.05187 [pdf, other]

doi 10.1109/TC.2023.3272284

DeepFire2: A Convolutional Spiking Neural Network Accelerator on FPGAs

Authors: Myat Thu Linn Aung, Daniel Gerlinghoff, Chu** Qu, Liwei Yang, Tian Huang, Rick Siow Mong Goh, Tao Luo, Weng-Fai Wong

Abstract: Brain-inspired spiking neural networks (SNNs) replace the multiply-accumulate operations of traditional neural networks by integrate-and-fire neurons, with the goal of achieving greater energy efficiency. Specialized hardware implementations of those neurons clearly have advantages over general-purpose devices in terms of power and performance, but exhibit poor scalability when it comes to acceler… ▽ More Brain-inspired spiking neural networks (SNNs) replace the multiply-accumulate operations of traditional neural networks by integrate-and-fire neurons, with the goal of achieving greater energy efficiency. Specialized hardware implementations of those neurons clearly have advantages over general-purpose devices in terms of power and performance, but exhibit poor scalability when it comes to accelerating large neural networks. DeepFire2 introduces a hardware architecture which can map large network layers efficiently across multiple super logic regions in a multi-die FPGA. That gives more control over resource allocation and parallelism, benefiting both throughput and energy consumption. Avoiding the use of lookup tables to implement the AND operations of an SNN, prevents the layer size to be limited by logic resources. A deep pipeline does not only lead to an increased clock speed of up to 600 MHz. We double the throughput and power efficiency compared to our previous version of DeepFire, which equates to an almost 10-fold improvement over other previous implementations. Importantly, we are able to deploy a large ImageNet model, while maintaining a throughput of over 1500 frames per second. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2112.02164 [pdf, other]

doi 10.1002/mp.15777

Bridging the gap between prostate radiology and pathology through machine learning

Authors: Indrani Bhattacharya, David S. Lim, Han Lin Aung, Xingchen Liu, Arun Seetharaman, Christian A. Kunder, Wei Shao, Simon J. C. Soerensen, Richard E. Fan, Pejman Ghanouni, Katherine J. To'o, James D. Brooks, Geoffrey A. Sonn, Mirabela Rusu

Abstract: Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize… ▽ More Prostate cancer is the second deadliest cancer for American men. While Magnetic Resonance Imaging (MRI) is increasingly used to guide targeted biopsies for prostate cancer diagnosis, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements. Machine learning methods to detect and localize cancer on prostate MRI can help standardize radiologist interpretations. However, existing machine learning methods vary not only in model architecture, but also in the ground truth labeling strategies used for model training. In this study, we compare different labeling strategies, namely, pathology-confirmed radiologist labels, pathologist labels on whole-mount histopathology images, and lesion-level and pixel-level digital pathologist labels (previously validated deep learning algorithm on histopathology images to predict pixel-level Gleason patterns) on whole-mount histopathology images. We analyse the effects these labels have on the performance of the trained machine learning models. Our experiments show that (1) radiologist labels and models trained with them can miss cancers, or underestimate cancer extent, (2) digital pathologist labels and models trained with them have high concordance with pathologist labels, and (3) models trained with digital pathologist labels achieve the best performance in prostate cancer detection in two different cohorts with different disease distributions, irrespective of the model architecture used. Digital pathologist labels can reduce challenges associated with human annotations, including labor, time, inter- and intra-reader variability, and can help bridge the gap between prostate radiology and pathology by enabling the training of reliable machine learning models to detect and localize prostate cancer on MRI. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: Indrani Bhattacharya and David S. Lim contributed equally as first authors. Geoffrey A. Sonn and Mirabela Rusu contributed equally as senior authors

arXiv:2004.05471 [pdf, other]

Farmland Parcel Delineation Using Spatio-temporal Convolutional Networks

Authors: Han Lin Aung, Burak Uzkent, Marshall Burke, David Lobell, Stefano Ermon

Abstract: Farm parcel delineation provides cadastral data that is important in develo** and managing climate change policies. Specifically, farm parcel delineation informs applications in downstream governmental policies of land allocation, irrigation, fertilization, green-house gases (GHG's), etc. This data can also be useful for the agricultural insurance sector for assessing compensations following dam… ▽ More Farm parcel delineation provides cadastral data that is important in develo** and managing climate change policies. Specifically, farm parcel delineation informs applications in downstream governmental policies of land allocation, irrigation, fertilization, green-house gases (GHG's), etc. This data can also be useful for the agricultural insurance sector for assessing compensations following damages associated with extreme weather events - a growing trend related to climate change. Using satellite imaging can be a scalable and cost effective manner to perform the task of farm parcel delineation to collect this valuable data. In this paper, we break down this task using satellite imaging into two approaches: 1) Segmentation of parcel boundaries, and 2) Segmentation of parcel areas. We implemented variations of UNets, one of which takes into account temporal information, which achieved the best results on our dataset on farmland parcels in France in 2017. △ Less

Submitted 20 April, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

arXiv:1905.01027 [pdf, other]

HADES-IoT: A Practical Host-Based Anomaly Detection System for IoT Devices (Extended Version)

Authors: Dominik Breitenbacher, Ivan Homoliak, Yan Lin Aung, Nils Ole Tippenhauer, Yuval Elovici

Abstract: Internet of Things (IoT) devices have become ubiquitous and are spread across many application domains including the industry, transportation, healthcare, and households. However, the proliferation of the IoT devices has raised the concerns about their security, especially when observing that many manufacturers focus only on the core functionality of their products due to short time to market and… ▽ More Internet of Things (IoT) devices have become ubiquitous and are spread across many application domains including the industry, transportation, healthcare, and households. However, the proliferation of the IoT devices has raised the concerns about their security, especially when observing that many manufacturers focus only on the core functionality of their products due to short time to market and low-cost pressures, while neglecting security aspects. Moreover, it does not exist any established or standardized method for measuring and ensuring the security of IoT devices. Consequently, vulnerabilities are left untreated, allowing attackers to exploit IoT devices for various purposes, such as compromising privacy, recruiting devices into a botnet, or misusing devices to perform cryptocurrency mining. In this paper, we present a practical Host-based Anomaly DEtection System for IoT (HADES-IoT) that represents the last line of defense. HADES-IoT has proactive detection capabilities, provides tamper-proof resistance, and it can be deployed on a wide range of Linux-based IoT devices. The main advantage of HADES-IoT is its low performance overhead, which makes it suitable for the IoT domain, where state-of-the-art approaches cannot be applied due to their high-performance demands. We deployed HADES-IoT on seven IoT devices to evaluate its effectiveness and performance overhead. Our experiments show that HADES-IoT achieved 100% effectiveness in the detection of current IoT malware such as VPNFilter and IoTReaper; while on average, requiring only 5.5% of available memory and causing only a low CPU load. △ Less

Submitted 2 May, 2019; originally announced May 2019.

arXiv:0803.3470 [pdf]

doi 10.1364/OE.16.005965

Three-Dimensional Grain Boundary Spectroscopy in Transparent High Power Ceramic Laser Materials

Authors: Mariola O. Ramirez, Jeffrey Wisdom, Haifeng Li, Yan Lin Aung, Joseph Stitt, Gary L. Messing, V. Dierolf, Zhiwen Liu, Akio Ikesue, Robert L. Byer, Venkatraman Gopalan

Abstract: Using confocal Raman and fluorescence spectroscopic imaging in 3-dimensions, we show direct evidence for Nd3+-Nd3+ interactions across grain boundaries (GBs) in Nd3+:YAG laser ceramics. It is clearly shown that Nd3+ segregation takes place at GBs leading to self-fluorescence quenching which affects a volume fraction as high as 20%. In addition, we show a clear trend of increasing spatial inhomog… ▽ More Using confocal Raman and fluorescence spectroscopic imaging in 3-dimensions, we show direct evidence for Nd3+-Nd3+ interactions across grain boundaries (GBs) in Nd3+:YAG laser ceramics. It is clearly shown that Nd3+ segregation takes place at GBs leading to self-fluorescence quenching which affects a volume fraction as high as 20%. In addition, we show a clear trend of increasing spatial inhomogeneities in Nd3+ concentration when the do** levels exceeds 3 at%, which is not detected by standard spectrometry techniques. These results could point the way to further improvements in what is already an impressive class of ceramic laser materials. △ Less

Submitted 24 March, 2008; originally announced March 2008.

Comments: 8 pages including Figures. submitted to Optics Express (Nov 07)

Showing 1–6 of 6 results for author: Aung, L