-
The coherent rank of a graph with three eigenvalues
Authors:
Gary Greaves,
Jose Yip
Abstract:
We characterise graphs that have three distinct eigenvalues and coherent ranks 8 and 9, linking the former to certain symmetric 2-designs and the latter to specific quasi-symmetric 2-designs. This characterisation leads to the discovery of a new biregular graph with three distinct eigenvalues. Additionally, we demonstrate that the coherent rank of a triregular graph with three distinct eigenvalues…
▽ More
We characterise graphs that have three distinct eigenvalues and coherent ranks 8 and 9, linking the former to certain symmetric 2-designs and the latter to specific quasi-symmetric 2-designs. This characterisation leads to the discovery of a new biregular graph with three distinct eigenvalues. Additionally, we demonstrate that the coherent rank of a triregular graph with three distinct eigenvalues is at least 14. Finally, we introduce a conjecturally infinite family of biregular graphs with three distinct eigenvalues, obtained by switching the block graphs of orthogonal arrays.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
CohortNet: Empowering Cohort Discovery for Interpretable Healthcare Analytics
Authors:
Qingpeng Cai,
Kai** Zheng,
H. V. Jagadish,
Beng Chin Ooi,
James Yip
Abstract:
Cohort studies are of significant importance in the field of healthcare analysis. However, existing methods typically involve manual, labor-intensive, and expert-driven pattern definitions or rely on simplistic clustering techniques that lack medical relevance. Automating cohort studies with interpretable patterns has great potential to facilitate healthcare analysis but remains an unmet need in p…
▽ More
Cohort studies are of significant importance in the field of healthcare analysis. However, existing methods typically involve manual, labor-intensive, and expert-driven pattern definitions or rely on simplistic clustering techniques that lack medical relevance. Automating cohort studies with interpretable patterns has great potential to facilitate healthcare analysis but remains an unmet need in prior research efforts. In this paper, we propose a cohort auto-discovery model, CohortNet, for interpretable healthcare analysis, focusing on the effective identification, representation, and exploitation of cohorts characterized by medically meaningful patterns. CohortNet initially learns fine-grained patient representations by separately processing each feature, considering both individual feature trends and feature interactions at each time step. Subsequently, it classifies each feature into distinct states and employs a heuristic cohort exploration strategy to effectively discover substantial cohorts with concrete patterns. For each identified cohort, it learns comprehensive cohort representations with credible evidence through associated patient retrieval. Ultimately, given a new patient, CohortNet can leverage relevant cohorts with distinguished importance, which can provide a more holistic understanding of the patient's conditions. Extensive experiments on three real-world datasets demonstrate that it consistently outperforms state-of-the-art approaches and offers interpretable insights from diverse perspectives in a top-down fashion.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Towards Audio Codec-based Speech Separation
Authors:
Jia Qi Yip,
Shengkui Zhao,
Dianwen Ng,
Eng Siong Chng,
Bin Ma
Abstract:
Recent improvements in neural audio codec (NAC) models have generated interest in adopting pre-trained codecs for a variety of speech processing applications to take advantage of the efficiencies gained from high compression, but these have yet been applied to the speech separation (SS) task. SS can benefit from high compression because the compute required for traditional SS models makes them imp…
▽ More
Recent improvements in neural audio codec (NAC) models have generated interest in adopting pre-trained codecs for a variety of speech processing applications to take advantage of the efficiencies gained from high compression, but these have yet been applied to the speech separation (SS) task. SS can benefit from high compression because the compute required for traditional SS models makes them impractical for many edge computing use cases. However, SS is a waveform-masking task where compression tends to introduce distortions that severely impact performance. Here we propose a novel task of Audio Codec-based SS, where SS is performed within the embedding space of a NAC, and propose a new model, Codecformer, to address this task. At inference, Codecformer achieves a 52x reduction in MAC while producing separation performance comparable to a cloud deployment of Sepformer. This method charts a new direction for performing efficient SS in practical scenarios.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Authors:
Kun Zhou,
Shengkui Zhao,
Yukun Ma,
Chong Zhang,
Hao Wang,
Dianwen Ng,
Chongjia Ni,
Nguyen Trung Hieu,
Jia Qi Yip,
Bin Ma
Abstract:
Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-su…
▽ More
Recent language model-based text-to-speech (TTS) frameworks demonstrate scalability and in-context learning capabilities. However, they suffer from robustness issues due to the accumulation of errors in speech unit predictions during autoregressive language modeling. In this paper, we propose a phonetic enhanced language modeling method to improve the performance of TTS models. We leverage self-supervised representations that are phonetically rich as the training target for the autoregressive language model. Subsequently, a non-autoregressive model is employed to predict discrete acoustic codecs that contain fine-grained acoustic details. The TTS model focuses solely on linguistic modeling during autoregressive training, thereby reducing the error propagation that occurs in non-autoregressive training. Both objective and subjective evaluations validate the effectiveness of our proposed method.
△ Less
Submitted 11 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Cosmology with Persistent Homology: a Fisher Forecast
Authors:
Jacky H. T. Yip,
Matteo Biagetti,
Alex Cole,
Karthik Viswanathan,
Gary Shiu
Abstract:
Persistent homology naturally addresses the multi-scale topological characteristics of the large-scale structure as a distribution of clusters, loops, and voids. We apply this tool to the dark matter halo catalogs from the Quijote simulations, and build a summary statistic for comparison with the joint power spectrum and bispectrum statistic regarding their information content on cosmological para…
▽ More
Persistent homology naturally addresses the multi-scale topological characteristics of the large-scale structure as a distribution of clusters, loops, and voids. We apply this tool to the dark matter halo catalogs from the Quijote simulations, and build a summary statistic for comparison with the joint power spectrum and bispectrum statistic regarding their information content on cosmological parameters and primordial non-Gaussianity. Through a Fisher analysis, we find that constraints from persistent homology are tighter for 8 out of the 10 parameters by margins of 13-50%. The complementarity of the two statistics breaks parameter degeneracies, allowing for a further gain in constraining power when combined. We run a series of consistency checks to consolidate our results, and conclude that our findings motivate incorporating persistent homology into inference pipelines for cosmological survey data.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
Authors:
Shengkui Zhao,
Yukun Ma,
Chongjia Ni,
Chong Zhang,
Hao Wang,
Trung Hieu Nguyen,
Kun Zhou,
Jiaqi Yip,
Dianwen Ng,
Bin Ma
Abstract:
Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to…
▽ More
Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework. Instead of applying the recurrent neural networks (RNNs) that use traditional recurrent connections, we present a recurrent module based on a feedforward sequential memory network (FSMN), which is considered "RNN-free" recurrent network due to the ability to capture recurrent patterns without using recurrent connections. Our recurrent module mainly comprises an enhanced dilated FSMN block by using gated convolutional units (GCU) and dense connections. In addition, a bottleneck layer and an output layer are also added for controlling information flow. The recurrent module relies on linear projections and convolutions for seamless, parallel processing of the entire sequence. The integrated MossFormer2 hybrid model demonstrates remarkable enhancements over MossFormer and surpasses other state-of-the-art methods in WSJ0-2/3mix, Libri2Mix, and WHAM!/WHAMR! benchmarks.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification
Authors:
Duc-Tuan Truong,
Ruijie Tao,
Jia Qi Yip,
Kong Aik Lee,
Eng Siong Chng
Abstract:
Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automa…
▽ More
Knowledge distillation (KD) is used to enhance automatic speaker verification performance by ensuring consistency between large teacher networks and lightweight student networks at the embedding level or label level. However, the conventional label-level KD overlooks the significant knowledge from non-target speakers, particularly their classification probabilities, which can be crucial for automatic speaker verification. In this paper, we first demonstrate that leveraging a larger number of training non-target speakers improves the performance of automatic speaker verification models. Inspired by this finding about the importance of non-target speakers' knowledge, we modified the conventional label-level KD by disentangling and emphasizing the classification probabilities of non-target speakers during knowledge distillation. The proposed method is applied to three different student model architectures and achieves an average of 13.67% improvement in EER on the VoxCeleb dataset compared to embedding-level and conventional label-level KD methods.
△ Less
Submitted 14 January, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
SPGM: Prioritizing Local Features for enhanced speech separation performance
Authors:
Jia Qi Yip,
Shengkui Zhao,
Yukun Ma,
Chongjia Ni,
Chong Zhang,
Hao Wang,
Trung Hieu Nguyen,
Kun Zhou,
Dianwen Ng,
Eng Siong Chng,
Bin Ma
Abstract:
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we pro…
▽ More
Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlap** chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm
△ Less
Submitted 10 March, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Codec Data Augmentation for Time-domain Heart Sound Classification
Authors:
Ansh Mishra,
Jia Qi Yip,
Eng Siong Chng
Abstract:
Heart auscultations are a low-cost and effective way of detecting valvular heart diseases early, which can save lives. Nevertheless, it has been difficult to scale this screening method since the effectiveness of auscultations is dependent on the skill of doctors. As such, there has been increasing research interest in the automatic classification of heart sounds using deep learning algorithms. Ho…
▽ More
Heart auscultations are a low-cost and effective way of detecting valvular heart diseases early, which can save lives. Nevertheless, it has been difficult to scale this screening method since the effectiveness of auscultations is dependent on the skill of doctors. As such, there has been increasing research interest in the automatic classification of heart sounds using deep learning algorithms. However, it is currently difficult to develop good heart sound classification models due to the limited data available for training. In this work, we propose a simple time domain approach, to the heart sound classification problem with a base classification error rate of 0.8 and show that augmentation of the data through codec simulation can improve the classification error rate to 0.2. With data augmentation, our approach outperforms the existing time-domain CNN-BiLSTM baseline model. Critically, our experiments show that codec data augmentation is effective in getting around the data limitation.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Analysis of Speech Separation Performance Degradation on Emotional Speech Mixtures
Authors:
Jia Qi Yip,
Dianwen Ng,
Bin Ma,
Chng Eng Siong
Abstract:
Despite recent strides made in Speech Separation, most models are trained on datasets with neutral emotions. Emotional speech has been known to degrade performance of models in a variety of speech tasks, which reduces the effectiveness of these models when deployed in real-world scenarios. In this paper we perform analysis to differentiate the performance degradation arising from the emotions in s…
▽ More
Despite recent strides made in Speech Separation, most models are trained on datasets with neutral emotions. Emotional speech has been known to degrade performance of models in a variety of speech tasks, which reduces the effectiveness of these models when deployed in real-world scenarios. In this paper we perform analysis to differentiate the performance degradation arising from the emotions in speech from the impact of out-of-domain inference. This is measured using a carefully designed test dataset, Emo2Mix, consisting of balanced data across all emotional combinations. We show that even models with strong out-of-domain performance such as Sepformer can still suffer significant degradation of up to 5.1 dB SI-SDRi on mixtures with strong emotions. This demonstrates the importance of accounting for emotions in real-world speech separation applications.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Learning from Topology: Cosmological Parameter Estimation from the Large-scale Structure
Authors:
Jacky H. T. Yip,
Adam Rouhiainen,
Gary Shiu
Abstract:
The topology of the large-scale structure of the universe contains valuable information on the underlying cosmological parameters. While persistent homology can extract this topological information, the optimal method for parameter estimation from the tool remains an open question. To address this, we propose a neural network model to map persistence images to cosmological parameters. Through a pa…
▽ More
The topology of the large-scale structure of the universe contains valuable information on the underlying cosmological parameters. While persistent homology can extract this topological information, the optimal method for parameter estimation from the tool remains an open question. To address this, we propose a neural network model to map persistence images to cosmological parameters. Through a parameter recovery test, we demonstrate that our model makes accurate and precise estimates, considerably outperforming conventional Bayesian inference approaches.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention
Authors:
Jia Qi Yip,
Tuan Truong,
Dianwen Ng,
Chong Zhang,
Yukun Ma,
Trung Hieu Nguyen,
Chongjia Ni,
Shengkui Zhao,
Eng Siong Chng,
Bin Ma
Abstract:
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we buil…
▽ More
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we build a Multi-Layer Aggregation (MLA) block using ACA to generate fixed-sized identity vectors from variable-length inputs. Through global attention, ACA-Net acts as an efficient global feature extractor that adapts to temporal variability unlike existing SV models that apply a fixed function for pooling over the temporal dimension which may obscure information about the signal's non-stationary temporal variability. Our experiments on the WSJ0-1talker show ACA-Net outperforms a strong baseline by 5\% relative improvement in EER using only 1/5 of the parameters.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Contrastive Speech Mixup for Low-resource Keyword Spotting
Authors:
Dianwen Ng,
Ruixi Zhang,
Jia Qi Yip,
Chong Zhang,
Yukun Ma,
Trung Hieu Nguyen,
Chongjia Ni,
Eng Siong Chng,
Bin Ma
Abstract:
Most of the existing neural-based models for keyword spotting (KWS) in smart devices require thousands of training samples to learn a decent audio representation. However, with the rising demand for smart devices to become more personalized, KWS models need to adapt quickly to smaller user samples. To tackle this challenge, we propose a contrastive speech mixup (CosMix) learning algorithm for low-…
▽ More
Most of the existing neural-based models for keyword spotting (KWS) in smart devices require thousands of training samples to learn a decent audio representation. However, with the rising demand for smart devices to become more personalized, KWS models need to adapt quickly to smaller user samples. To tackle this challenge, we propose a contrastive speech mixup (CosMix) learning algorithm for low-resource KWS. CosMix introduces an auxiliary contrastive loss to the existing mixup augmentation technique to maximize the relative similarity between the original pre-mixed samples and the augmented samples. The goal is to inject enhancing constraints to guide the model towards simpler but richer content-based speech representations from two augmented views (i.e. noisy mixed and clean pre-mixed utterances). We conduct our experiments on the Google Speech Command dataset, where we trim the size of the training set to as small as 2.5 mins per keyword to simulate a low-resource condition. Our experimental results show a consistent improvement in the performance of multiple models, which exhibits the effectiveness of our method.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Toward Cohort Intelligence: A Universal Cohort Representation Learning Framework for Electronic Health Record Analysis
Authors:
Changshuo Liu,
Wenqiao Zhang,
Beng Chin Ooi,
James Wei Luen Yip,
Lingze Zeng,
Kai** Zheng
Abstract:
Electronic Health Records (EHR) are generated from clinical routine care recording valuable information of broad patient populations, which provide plentiful opportunities for improving patient management and intervention strategies in clinical practice. To exploit the enormous potential of EHR data, a popular EHR data analysis paradigm in machine learning is EHR representation learning, which fir…
▽ More
Electronic Health Records (EHR) are generated from clinical routine care recording valuable information of broad patient populations, which provide plentiful opportunities for improving patient management and intervention strategies in clinical practice. To exploit the enormous potential of EHR data, a popular EHR data analysis paradigm in machine learning is EHR representation learning, which first leverages the individual patient's EHR data to learn informative representations by a backbone, and supports diverse health-care downstream tasks grounded on the representations. Unfortunately, such a paradigm fails to access the in-depth analysis of patients' relevance, which is generally known as cohort studies in clinical practice. Specifically, patients in the same cohort tend to share similar characteristics, implying their resemblance in medical conditions such as symptoms or diseases. In this paper, we propose a universal COhort Representation lEarning (CORE) framework to augment EHR utilization by leveraging the fine-grained cohort information among patients. In particular, CORE first develops an explicit patient modeling task based on the prior knowledge of patients' diagnosis codes, which measures the latent relevance among patients to adaptively divide the cohorts for each patient. Based on the constructed cohorts, CORE recodes the pre-extracted EHR data representation from intra- and inter-cohort perspectives, yielding augmented EHR data representation learning. CORE is readily applicable to diverse backbone models, serving as a universal plug-in framework to infuse cohort information into healthcare methods for boosted performance. We conduct an extensive experimental evaluation on two real-world datasets, and the experimental results demonstrate the effectiveness and generalizability of CORE.
△ Less
Submitted 12 April, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition
Authors:
Dianwen Ng,
Ruixi Zhang,
Jia Qi Yip,
Zhao Yang,
**jie Ni,
Chong Zhang,
Yukun Ma,
Chongjia Ni,
Eng Siong Chng,
Bin Ma
Abstract:
Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world…
▽ More
Existing self-supervised pre-trained speech models have offered an effective way to leverage massive unannotated corpora to build good automatic speech recognition (ASR). However, many current models are trained on a clean corpus from a single source, which tends to do poorly when noise is present during testing. Nonetheless, it is crucial to overcome the adverse influence of noise for real-world applications. In this work, we propose a novel training framework, called deHuBERT, for noise reduction encoding inspired by H. Barlow's redundancy-reduction principle. The new framework improves the HuBERT training algorithm by introducing auxiliary losses that drive the self- and cross-correlation matrix between pairwise noise-distorted embeddings towards identity matrix. This encourages the model to produce noise-agnostic speech representations. With this method, we report improved robustness in noisy environments, including unseen noises, without impairing the performance on the clean set.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
From Plate to Prevention: A Dietary Nutrient-aided Platform for Health Promotion in Singapore
Authors:
Kai** Zheng,
Thao Nguyen,
Jesslyn Hwei Sing Chong,
Charlene Enhui Goh,
Melanie Herschel,
Hee Hoon Lee,
Changshuo Liu,
Beng Chin Ooi,
Wei Wang,
James Yip
Abstract:
Singapore has been striving to improve the provision of healthcare services to her people. In this course, the government has taken note of the deficiency in regulating and supervising people's nutrient intake, which is identified as a contributing factor to the development of chronic diseases. Consequently, this issue has garnered significant attention. In this paper, we share our experience in a…
▽ More
Singapore has been striving to improve the provision of healthcare services to her people. In this course, the government has taken note of the deficiency in regulating and supervising people's nutrient intake, which is identified as a contributing factor to the development of chronic diseases. Consequently, this issue has garnered significant attention. In this paper, we share our experience in addressing this issue and attaining medical-grade nutrient intake information to benefit Singaporeans in different aspects. To this end, we develop the FoodSG platform to incubate diverse healthcare-oriented applications as a service in Singapore, taking into account their shared requirements. We further identify the profound meaning of localized food datasets and systematically clean and curate a localized Singaporean food dataset FoodSG-233. To overcome the hurdle in recognition performance brought by Singaporean multifarious food dishes, we propose to integrate supervised contrastive learning into our food recognition model FoodSG-SCL for the intrinsic capability to mine hard positive/negative samples and therefore boost the accuracy. Through a comprehensive evaluation, we present performance results of the proposed model and insights on food-related healthcare applications. The FoodSG-233 dataset has been released in https://foodlg.comp.nus.edu.sg/.
△ Less
Submitted 28 March, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Authors:
Dianwen Ng,
Jia Qi Yip,
Tanmay Surana,
Zhao Yang,
Chong Zhang,
Yukun Ma,
Chongjia Ni,
Eng Siong Chng,
Bin Ma
Abstract:
Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to…
▽ More
Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen out-of-domain noises and unseen variant noise SNRs. This indicates the significance of our work with the overall refinement in noise robustness.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Amino Acid Classification in 2D NMR Spectra via Acoustic Signal Embeddings
Authors:
Jia Qi Yip,
Dianwen Ng,
Bin Ma,
Konstantin Pervushin,
Eng Siong Chng
Abstract:
Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development. Unfortunately, NMR data can cost thousands of dollars per sample to collect and it can take a specialist weeks to assign the observed resonances to specific chemical groups. There has thus been gro…
▽ More
Nuclear Magnetic Resonance (NMR) is used in structural biology to experimentally determine the structure of proteins, which is used in many areas of biology and is an important part of drug development. Unfortunately, NMR data can cost thousands of dollars per sample to collect and it can take a specialist weeks to assign the observed resonances to specific chemical groups. There has thus been growing interest in the NMR community to use deep learning to automate NMR data annotation. Due to similarities between NMR and audio data, we propose that methods used in acoustic signal processing can be applied to NMR as well. Using a simulated amino acid dataset, we show that by swap** out filter banks with a trainable convolutional encoder, acoustic signal embeddings from speaker verification models can be used for amino acid classification in 2D NMR spectra by treating each amino acid as a unique speaker. On an NMR dataset comparable in size with of 46 hours of audio, we achieve a classification performance of 97.7% on a 20-class problem. We also achieve a 23% relative improvement by using an acoustic embedding model compared to an existing NMR-based model.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
The Metaverse Data Deluge: What Can We Do About It?
Authors:
Beng Chin Ooi,
Gang Chen,
Mike Zheng Shou,
Kian-Lee Tan,
Anthony Tung,
Xiaokui Xiao,
James Wei Luen Yip,
Meihui Zhang
Abstract:
In the Metaverse, the physical space and the virtual space co-exist, and interact simultaneously. While the physical space is virtually enhanced with information, the virtual space is continuously refreshed with real-time, real-world information. To allow users to process and manipulate information seamlessly between the real and digital spaces, novel technologies must be developed. These include…
▽ More
In the Metaverse, the physical space and the virtual space co-exist, and interact simultaneously. While the physical space is virtually enhanced with information, the virtual space is continuously refreshed with real-time, real-world information. To allow users to process and manipulate information seamlessly between the real and digital spaces, novel technologies must be developed. These include smart interfaces, new augmented realities, efficient storage and data management and dissemination techniques. In this paper, we first discuss some promising co-space applications. These applications offer opportunities that neither of the spaces can realize on its own. We then discuss challenges. Finally, we discuss and envision what are likely to be required from the database and system perspectives.
△ Less
Submitted 10 November, 2022; v1 submitted 14 June, 2022;
originally announced June 2022.
-
From Dark Matter to Galaxies with Convolutional Neural Networks
Authors:
Jacky H. T. Yip,
Xinyue Zhang,
Yanfang Wang,
Wei Zhang,
Yueqiu Sun,
Gabriella Contardo,
Francisco Villaescusa-Navarro,
Siyu He,
Shy Genel,
Shirley Ho
Abstract:
Cosmological simulations play an important role in the interpretation of astronomical data, in particular in comparing observed data to our theoretical expectations. However, to compare data with these simulations, the simulations in principle need to include gravity, magneto-hydrodyanmics, radiative transfer, etc. These ideal large-volume simulations (gravo-magneto-hydrodynamical) are incredibly…
▽ More
Cosmological simulations play an important role in the interpretation of astronomical data, in particular in comparing observed data to our theoretical expectations. However, to compare data with these simulations, the simulations in principle need to include gravity, magneto-hydrodyanmics, radiative transfer, etc. These ideal large-volume simulations (gravo-magneto-hydrodynamical) are incredibly computationally expensive which can cost tens of millions of CPU hours to run. In this paper, we propose a deep learning approach to map from the dark-matter-only simulation (computationally cheaper) to the galaxy distribution (from the much costlier cosmological simulation). The main challenge of this task is the high sparsity in the target galaxy distribution: space is mainly empty. We propose a cascade architecture composed of a classification filter followed by a regression procedure. We show that our result outperforms a state-of-the-art model used in the astronomical community, and provides a good trade-off between computational cost and prediction accuracy.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Strong magnetic coupling in the hexagonal R5Pb3 compounds (R = Gd-Tm)
Authors:
Andrea Marcinkova,
Clarina de la Cruz,
Joshua Yip,
Liang L. Zhao,
Jiakui K. Wang,
E. Svanidze,
E. Morosan
Abstract:
We have synthesized R5Pb3 (R = Gd-Tm) compounds in polycrystalline form and performed structural analysis, magnetization, and neutron scattering measurements. For all R5Pb3 reported here the Weiss temperatures θW are several times smaller than the ordering temperatures TORD, while the latter are remarkably high (TORD up to 275 K for R = Gd) compared to other known R-M binaries (M = Si, Ge, Sn and…
▽ More
We have synthesized R5Pb3 (R = Gd-Tm) compounds in polycrystalline form and performed structural analysis, magnetization, and neutron scattering measurements. For all R5Pb3 reported here the Weiss temperatures θW are several times smaller than the ordering temperatures TORD, while the latter are remarkably high (TORD up to 275 K for R = Gd) compared to other known R-M binaries (M = Si, Ge, Sn and Sb). The magnetic order changes from ferromagnetic in R = Gd, Tb to antiferromagnetic in R = Dy-Tm. Below TORD, the magnetization measurements together with neutron powder diffraction show complex magnetic behavior and reveal the existence of up to three additional phase transitions. We believe this to be a result of crystal electric field effects responsible for high magnetocrystalline anisotropy. The R5Pb3 magnetic unit cells for R = Tb-Tm can be described with incommensurate magnetic wave vectors with spin modulation either along the c axis in R = Tb, Er and Tm or within the ab-plane in R = Dy and Ho.
△ Less
Submitted 24 October, 2014;
originally announced October 2014.
-
Graphene Nucleation on Transition Metal Surface: Structure Transformation and Role of the Metal Step Edge
Authors:
Junfeng Gao,
Joanne Yip,
Jijun Zhao,
Boris I. Yakobson,
Feng Ding
Abstract:
The nucleation of graphene on a transition metal (TM) surface, either on a terrace or near a step edge, is systematically explored using density functional theory (DFT) calculations and applying the two-dimensional (2D) crystal nucleation theory. Careful optimization of the supported carbon clusters, CN (with size N ranging from 1 to 24), on the Ni(111) surface indicates a ground state structure t…
▽ More
The nucleation of graphene on a transition metal (TM) surface, either on a terrace or near a step edge, is systematically explored using density functional theory (DFT) calculations and applying the two-dimensional (2D) crystal nucleation theory. Careful optimization of the supported carbon clusters, CN (with size N ranging from 1 to 24), on the Ni(111) surface indicates a ground state structure transformation from a one-dimensional (1D) C chain to a two-dimensional (2D) sp2 C network at N ~ 10-12. Furthermore, the crucial parameters controlling graphene growth on the metal surface, nucleation barrier, nucleus size, and the nucleation rate on a terrace or near a step edge, are calculated. In agreement with numerous experimental observations, our analysis shows that graphene nucleation near a metal step edge is superior to that on a terrace. Based on our analysis, we propose the use of seeded graphene to synthesize high-quality graphene in large area.
△ Less
Submitted 31 March, 2011;
originally announced April 2011.
-
XML Data Integrity Based on Concatenated Hash Function
Authors:
Baolong Liu,
Joan Lu,
Jim Yip
Abstract:
Data integrity is the fundamental for data authentication. A major problem for XML data authentication is that signed XML data can be copied to another document but still keep signature valid. This is caused by XML data integrity protecting. Through investigation, the paper discovered that besides data content integrity, XML data integrity should also protect element location information, and co…
▽ More
Data integrity is the fundamental for data authentication. A major problem for XML data authentication is that signed XML data can be copied to another document but still keep signature valid. This is caused by XML data integrity protecting. Through investigation, the paper discovered that besides data content integrity, XML data integrity should also protect element location information, and context referential integrity under fine-grained security situation. The aim of this paper is to propose a model for XML data integrity considering XML data features. The paper presents an XML data integrity model named as CSR (content integrity, structure integrity, context referential integrity) based on a concatenated hash function. XML data content integrity is ensured using an iterative hash process, structure integrity is protected by hashing an absolute path string from root node, and context referential integrity is ensured by protecting context-related elements. Presented XML data integrity model can satisfy integrity requirements under situation of fine-grained security, and compatible with XML signature. Through evaluation, the integrity model presented has a higher efficiency on digest value-generation than the Merkle hash tree-based integrity model for XML data.
△ Less
Submitted 19 June, 2009;
originally announced June 2009.