Search | arXiv e-print repository

PathoLM: Identifying pathogenicity from the DNA sequence through the Genome Foundation Model

Authors: Sajib Acharjee Dip, Uddip Acharjee Shuvo, Tran Chau, Haoqiu Song, Petra Choi, Xuan Wang, Liqing Zhang

Abstract: Pathogen identification is pivotal in diagnosing, treating, and preventing diseases, crucial for controlling infections and safeguarding public health. Traditional alignment-based methods, though widely used, are computationally intense and reliant on extensive reference databases, often failing to detect novel pathogens due to their low sensitivity and specificity. Similarly, conventional machine… ▽ More Pathogen identification is pivotal in diagnosing, treating, and preventing diseases, crucial for controlling infections and safeguarding public health. Traditional alignment-based methods, though widely used, are computationally intense and reliant on extensive reference databases, often failing to detect novel pathogens due to their low sensitivity and specificity. Similarly, conventional machine learning techniques, while promising, require large annotated datasets and extensive feature engineering and are prone to overfitting. Addressing these challenges, we introduce PathoLM, a cutting-edge pathogen language model optimized for the identification of pathogenicity in bacterial and viral sequences. Leveraging the strengths of pre-trained DNA models such as the Nucleotide Transformer, PathoLM requires minimal data for fine-tuning, thereby enhancing pathogen detection capabilities. It effectively captures a broader genomic context, significantly improving the identification of novel and divergent pathogens. We developed a comprehensive data set comprising approximately 30 species of viruses and bacteria, including ESKAPEE pathogens, seven notably virulent bacterial strains resistant to antibiotics. Additionally, we curated a species classification dataset centered specifically on the ESKAPEE group. In comparative assessments, PathoLM dramatically outperforms existing models like DciPatho, demonstrating robust zero-shot and few-shot capabilities. Furthermore, we expanded PathoLM-Sp for ESKAPEE species classification, where it showed superior performance compared to other advanced deep learning methods, despite the complexities of the task. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 9 pages, 3 figures

arXiv:2107.08063 [pdf, other]

Studying Bioluminescence Flashes with the ANTARES Deep Sea Neutrino Telescope

Authors: N. Reeb, S. Hutschenreuter, P. Zehetner, T. Ensslin, S. Alves, M. André, M. Anghinolfi, G. Anton, M. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, B. Belhorma, M. Bendahman, V. Bertin, S. Biagi, M. Bissinger, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Brânzaş, R. Bruijn, J. Brunner, J. Busto , et al. (119 additional authors not shown)

Abstract: We develop a novel technique to exploit the extensive data sets provided by underwater neutrino telescopes to gain information on bioluminescence in the deep sea. The passive nature of the telescopes gives us the unique opportunity to infer information on bioluminescent organisms without actively interfering with them. We propose a statistical method that allows us to reconstruct the light emissio… ▽ More We develop a novel technique to exploit the extensive data sets provided by underwater neutrino telescopes to gain information on bioluminescence in the deep sea. The passive nature of the telescopes gives us the unique opportunity to infer information on bioluminescent organisms without actively interfering with them. We propose a statistical method that allows us to reconstruct the light emission of individual organisms, as well as their location and movement. A mathematical model is built to describe the measurement process of underwater neutrino telescopes and the signal generation of the biological organisms. The Metric Gaussian Variational Inference algorithm is used to reconstruct the model parameters using photon counts recorded by the neutrino detectors. We apply this method to synthetic data sets and data collected by the ANTARES neutrino telescope. The telescope is located 40 km off the French coast and fixed to the sea floor at a depth of 2475 m. The runs with synthetic data reveal that we can reliably model the emitted bioluminescent flashes of the organisms. Furthermore, we find that the spatial resolution of the localization of light sources highly depends on the configuration of the telescope. Precise measurements of the efficiencies of the detectors and the attenuation length of the water are crucial to reconstruct the light emission. Finally, the application to ANTARES data reveals the first precise localizations of bioluminescent organisms using neutrino telescope data. △ Less

Submitted 16 July, 2021; originally announced July 2021.

arXiv:2009.02816 [pdf, other]

A comparison of oscillatory characteristics in covert speech and speech perception

Authors: Jae Moon, Silvia Orlandi, Tom Chau

Abstract: Covert speech, the silent production of words in the mind, has been studied increasingly to understand and decode thoughts. This task has often been compared to speech perception as it brings about similar topographical activation patterns in common brain areas. In studies of speech comprehension, neural oscillations are thought to play a key role in the sampling of speech at varying temporal scal… ▽ More Covert speech, the silent production of words in the mind, has been studied increasingly to understand and decode thoughts. This task has often been compared to speech perception as it brings about similar topographical activation patterns in common brain areas. In studies of speech comprehension, neural oscillations are thought to play a key role in the sampling of speech at varying temporal scales. However, very little is known about the role of oscillations in covert speech. In this study, we aimed to determine to what extent each oscillatory frequency band is used to process words in covert speech and speech perception tasks. Secondly, we asked whether the θ and γ activity in the two tasks are related through phase-amplitude coupling (PAC). First, continuous wavelet transform was performed on epoched signals and subsequently two-tailed t-tests between two classes were conducted to determine statistical distinctions in frequency and time. While the perception task dynamically uses all frequencies with more prominent θ and γ activity, the covert task favoured higher frequencies with significantly higher γ activity than perception. Moreover, the perception condition produced significant θ-γ PAC suggesting a linkage of syllabic and phonological sampling. Although this was found to be suppressed in the covert condition, we found significant pseudo-coupling between perception θ and covert speech γ. We report that covert speech processing is largely conducted by higher frequencies, and that the γ- and θ-bands may function similarly and differently across tasks, respectively. This study is the first to characterize covert speech in terms of neural oscillatory engagement. Future studies are directed to explore oscillatory characteristics and inter-task relationships with a more diverse vocabulary. △ Less

Submitted 6 September, 2020; originally announced September 2020.

Comments: 22 pages, 10 figures

arXiv:1912.04828 [pdf]

Navigating in Virtual Reality using Thought: The Development and Assessment of a Motor Imagery based Brain-Computer Interface

Authors: Behnam Reyhani-Masoleh, Tom Chau

Abstract: Brain-computer interface (BCI) systems have potential as assistive technologies for individuals with severe motor impairments. Nevertheless, individuals must first participate in many training sessions to obtain adequate data for optimizing the classification algorithm and subsequently acquiring brain-based control. Such traditional training paradigms have been dubbed unengaging and unmotivating f… ▽ More Brain-computer interface (BCI) systems have potential as assistive technologies for individuals with severe motor impairments. Nevertheless, individuals must first participate in many training sessions to obtain adequate data for optimizing the classification algorithm and subsequently acquiring brain-based control. Such traditional training paradigms have been dubbed unengaging and unmotivating for users. In recent years, it has been shown that the synergy of virtual reality (VR) and a BCI can lead to increased user engagement. This study created a 3-class BCI with a rather elaborate EEG signal processing pipeline that heavily utilizes machine learning. The BCI initially presented sham feedback but was eventually driven by EEG associated with motor imagery. The BCI tasks consisted of motor imagery of the feet and left and right hands, which were used to navigate a single-path maze in VR. Ten of the eleven recruited participants achieved online performance superior to chance (p < 0.01), while the majority successfully completed more than 70% of the prescribed navigational tasks. These results indicate that the proposed paradigm warrants further consideration as neurofeedback BCI training tool. A paradigm that allows users, from their perspective, control from the outset without the need for prior data collection sessions. △ Less

Submitted 10 December, 2019; originally announced December 2019.

Comments: 23 pages, 10 figures

arXiv:1809.00395 [pdf]

doi 10.1088/1741-2552/aae4b9

Online classification of imagined speech using functional near-infrared spectroscopy signals

Authors: Alborz Rezazadeh Sereshkeh, Rozhin Yousefi, Andrew T Wong, Tom Chau

Abstract: Most brain-computer interfaces (BCIs) based on functional near-infrared spectroscopy (fNIRS) require that users perform mental tasks such as motor imagery, mental arithmetic, or music imagery to convey a message or to answer simple yes or no questions. These cognitive tasks usually have no direct association with the communicative intent, which makes them difficult for users to perform. In this pa… ▽ More Most brain-computer interfaces (BCIs) based on functional near-infrared spectroscopy (fNIRS) require that users perform mental tasks such as motor imagery, mental arithmetic, or music imagery to convey a message or to answer simple yes or no questions. These cognitive tasks usually have no direct association with the communicative intent, which makes them difficult for users to perform. In this paper, a 3-class intuitive BCI is presented which enables users to directly answer yes or no questions by covertly rehearsing the word 'yes' or 'no' for 15 s. The BCI also admits an equivalent duration of unconstrained rest which constitutes the third discernable task. Twelve participants each completed one offline block and six online blocks over the course of 2 sessions. The mean value of the change in oxygenated hemoglobin concentration during a trial was calculated for each channel and used to train a regularized linear discriminant analysis (RLDA) classifier. By the final online block, 9 out of 12 participants were performing above chance (p<0.001), with a 3-class accuracy of 83.8+9.4%. Even when considering all participants, the average online 3-class accuracy over the last 3 blocks was 64.1+20.6%, with only 3 participants scoring below chance (p<0.001). For most participants, channels in the left temporal and temporoparietal cortex provided the most discriminative information. To our knowledge, this is the first report of an online fNIRS 3-class imagined speech BCI. Our findings suggest that imagined speech can be used as a reliable activation task for selected users for the development of more intuitive BCIs for communication. △ Less

Submitted 2 September, 2018; originally announced September 2018.

Showing 1–5 of 5 results for author: Chau, T