Search | arXiv e-print repository

arXiv:2406.01938 [pdf, other]

Nutrition Estimation for Dietary Management: A Transformer Approach with Depth Sensing

Authors: Zhengyi Kwan, Wei Zhang, Zhengkui Wang, Aik Beng Ng, Simon See

Abstract: Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder an… ▽ More Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder and decoder, along with two types of feature fusion modules, specialized for estimating five nutritional factors. These modules effectively balance the efficiency and effectiveness of feature extraction with flexible usage of our customized attention mechanisms and fusion strategies. Our experimental study shows that NuNet outperforms its variants and existing solutions significantly for nutrition estimation. It achieves an error rate of 15.65%, the lowest known to us, largely due to our multi-scale architecture and fusion modules. This research holds practical values for dietary management with huge potential for transnational research and deployment and could inspire other applications involving multiple data types with varying degrees of importance. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages

arXiv:2405.12168 [pdf, other]

WiDRa -- Enabling Millimeter-Level Differential Ranging Accuracy in Wi-Fi Using Carrier Phase

Authors: Vishnu V. Ratnam, Bilal Sadiq, Hao Chen, Wei Sun, Shunyao Wu, Boon L. Ng, Jianzhong, Zhang

Abstract: Although Wi-Fi is an ideal technology for many ranging applications, the performance of current methods is limited by the system bandwidth, leading to low accuracy of $\sim 1$ m. For many applications, measuring differential range, viz., the change in the range between adjacent measurements, is sufficient. Correspondingly, this work proposes WiDRa - a Wi-Fi based Differential Ranging solution that… ▽ More Although Wi-Fi is an ideal technology for many ranging applications, the performance of current methods is limited by the system bandwidth, leading to low accuracy of $\sim 1$ m. For many applications, measuring differential range, viz., the change in the range between adjacent measurements, is sufficient. Correspondingly, this work proposes WiDRa - a Wi-Fi based Differential Ranging solution that provides differential range estimates by using the sum-carrier-phase information. The proposed method is not limited by system bandwidth and can track range changes even smaller than the carrier wavelength. The proposed method is first theoretically justified, while taking into consideration the various hardware impairments affecting Wi-Fi chips. In the process, methods to isolate the sum-carrier phase from the hardware impairments are proposed. Extensive simulation results show that WiDRa can achieve a differential range estimation root-mean-square-error (RMSE) of $\approx 1$ mm in channels with a Rician-factor $\geq 7$ (a $100 \times$ improvement to existing methods). The proposed methods are also validated on off-the-shelf Wi-Fi hardware to demonstrate feasibility, where they achieve an RMSE of $< 1$ mm in the differential range. Finally, limitations of current investigation and future directions of exploration are suggested, to further tap into the potential of WiDRa. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Accepted to IEEE JSAC special issue on Positioning and Sensing Over Wireless Networks, 2024

arXiv:2405.11622 [pdf, other]

Continuous Predictive Modeling of Clinical Notes and ICD Codes in Patient Health Records

Authors: Mireia Hernandez Caralt, Clarence Boon Liang Ng, Marek Rei

Abstract: Electronic Health Records (EHR) serve as a valuable source of patient information, offering insights into medical histories, treatments, and outcomes. Previous research has developed systems for detecting applicable ICD codes that should be assigned while writing a given EHR document, mainly focusing on discharge summaries written at the end of a hospital stay. In this work, we investigate the pot… ▽ More Electronic Health Records (EHR) serve as a valuable source of patient information, offering insights into medical histories, treatments, and outcomes. Previous research has developed systems for detecting applicable ICD codes that should be assigned while writing a given EHR document, mainly focusing on discharge summaries written at the end of a hospital stay. In this work, we investigate the potential of predicting these codes for the whole patient stay at different time points during their stay, even before they are officially assigned by clinicians. The development of methods to predict diagnoses and treatments earlier in advance could open opportunities for predictive medicine, such as identifying disease risks sooner, suggesting treatments, and optimizing resource allocation. Our experiments show that predictions regarding final ICD codes can be made already two days after admission and we propose a custom model that improves performance on this early prediction task. △ Less

Submitted 19 May, 2024; originally announced May 2024.

ACM Class: I.2.7; J.3

arXiv:2405.04165 [pdf, other]

LingML: Linguistic-Informed Machine Learning for Enhanced Fake News Detection

Authors: Jasraj Singh, Fang Liu, Hong Xu, Bee Chin Ng, Wei Zhang

Abstract: Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with… ▽ More Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with linguistics input and we propose LingML, linguistic-informed ML, for fake news detection. We conducted an experimental study with a popular dataset on fake news during the pandemic. The experiment results show that our proposed solution is highly effective. There are fewer than two errors out of every ten attempts with only linguistic input used in ML and the knowledge is highly explainable. When linguistics input is integrated with advanced large-scale ML models for natural language processing, our solution outperforms existing ones with 1.8% average error rate. LingML creates a new path with linguistics to push the frontier of effective and efficient fake news detection. It also sheds light on real-world multi-disciplinary applications requiring both ML and domain expertise to achieve optimal performance. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 7 pages

arXiv:2404.06224 [pdf, other]

Low-Cost Generation and Evaluation of Dictionary Example Sentences

Authors: Bill Cai, Clarence Boon Liang Ng, Daniel Tan, Shelvia Hotama

Abstract: Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundationa… ▽ More Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundational models present the opportunity to create low-cost, zero-shot methods for the generation and evaluation of dictionary example sentences. We introduce a new automatic evaluation metric called OxfordEval that measures the win-rate of generated sentences against existing Oxford Dictionary sentences. OxfordEval shows high alignment with human judgments, enabling large-scale automated quality evaluation. We experiment with various LLMs and configurations to generate dictionary sentences across word classes. We complement this with a novel approach of using masked language models to identify and select sentences that best exemplify word meaning. The eventual model, FM-MLM, achieves over 85.1% win rate against Oxford baseline sentences according to OxfordEval, compared to 39.8% win rate for prior model-generated sentences. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.00869 [pdf, other]

Towards Automated Generation of Smart Grid Cyber Range for Cybersecurity Experiments and Training

Authors: Daisuke Mashima, Muhammad M. Roomi, Bennet Ng, Zbigniew Kalbarczyk, S. M. Suhail Hussain, Ee-chien Chang

Abstract: Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid… ▽ More Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid cyber range, has been demanded by industry players as well as academia. A smart grid cyber range is typically implemented as a combination of cyber system emulation, which allows interactivity, and physical system (i.e., power grid) simulation that are tightly coupled for consistent cyber and physical behaviours. However, its design and implementation require intensive expertise and efforts in cyber and physical aspects of smart power systems as well as software/system engineering. While many industry players, including power grid operators, device vendors, research and education sectors are interested, availability of the smart grid cyber range is limited to a small number of research labs. To address this challenge, we have developed a framework for modelling a smart grid cyber range using an XML-based language, called SG-ML, and for "compiling" the model into an operational cyber range with minimal engineering efforts. The modelling language includes standardized schema from IEC 61850 and IEC 61131, which allows industry players to utilize their existing configurations. The SG-ML framework aims at making a smart grid cyber range available to broader user bases to facilitate cybersecurity R\&D and hands-on exercises. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Published at DSN 2023 Industry Track

arXiv:2403.18436 [pdf, other]

Collaborative Active Learning in Conditional Trust Environment

Authors: Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

Abstract: In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses pri… ▽ More In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses privacy and security concerns by eliminating the need for direct model and data disclosure; (b) it enables the use of different data sources and insights without direct data exchange; and (c) it promotes cost-effectiveness and resource efficiency through shared labeling costs. To realize these benefits, we introduce a collaborative active learning framework designed to fulfill the aforementioned objectives. We validate the effectiveness of the proposed framework through simulations. The results demonstrate that collaboration leads to higher AUC scores compared to independent efforts, highlighting the framework's ability to overcome the limitations of individual models. These findings support the use of collaborative approaches in active learning, emphasizing their potential to enhance outcomes through collective expertise and shared resources. Our work provides a foundation for further research on collaborative active learning and its practical applications in various domains where data privacy, cost efficiency, and model performance are critical considerations. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 5 pages, 9 figures, conference

arXiv:2403.08989 [pdf, ps, other]

Maximum Channel Coding Rate of Finite Block Length MIMO Faster-Than-Nyquist Signaling

Authors: Zichao Zhang, Melda Yuksel, Halim Yanikomeroglu, Benjamin K. Ng, Chan-Tong Lam

Abstract: The pursuit of higher data rates and efficient spectrum utilization in modern communication technologies necessitates novel solutions. In order to provide insights into improving spectral efficiency and reducing latency, this study investigates the maximum channel coding rate (MCCR) of finite block length (FBL) multiple-input multiple-output (MIMO) faster-than-Nyquist (FTN) channels. By optimizing… ▽ More The pursuit of higher data rates and efficient spectrum utilization in modern communication technologies necessitates novel solutions. In order to provide insights into improving spectral efficiency and reducing latency, this study investigates the maximum channel coding rate (MCCR) of finite block length (FBL) multiple-input multiple-output (MIMO) faster-than-Nyquist (FTN) channels. By optimizing power allocation, we derive the system's MCCR expression. Simulation results are compared with the existing literature to reveal the benefits of FTN in FBL transmission. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.01346 [pdf, other]

Improve Cost Efficiency of Active Learning over Noisy Dataset

Authors: Zan-Kai Chong, Hiroyuki Ohsaki, Bryan Ng

Abstract: Active learning is a learning strategy whereby the machine learning algorithm actively identifies and labels data points to optimize its learning. This strategy is particularly effective in domains where an abundance of unlabeled data exists, but the cost of labeling these data points is prohibitively expensive. In this paper, we consider cases of binary classification, where acquiring a positive… ▽ More Active learning is a learning strategy whereby the machine learning algorithm actively identifies and labels data points to optimize its learning. This strategy is particularly effective in domains where an abundance of unlabeled data exists, but the cost of labeling these data points is prohibitively expensive. In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances. For example, in the financial industry, such as in money-lending businesses, a defaulted loan constitutes a positive event leading to substantial financial loss. To address this issue, we propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling. Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 6 pages, 9 figures, conference

arXiv:2312.11682 [pdf, other]

doi 10.1109/ACCESS.2022.3190418

Joint Phase-Time Arrays: A Paradigm for Frequency-Dependent Analog Beamforming in 6G

Authors: Vishnu V. Ratnam, Jianhua Mo, Ahmad AlAmmouri, Boon L. Ng, Jianzhong, Zhang, Andreas F. Molisch

Abstract: Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper pr… ▽ More Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper proposes a new class of hybrid beamforming called Joint phase-time arrays (JPTA), that additionally use true-time delay elements in the analog beamforming to create frequency-dependent analog beams. Using as an example two important frequency-dependent beam behaviors, the numerous benefits of such flexibility are exemplified. Subsequently, the JPTA beamformer design problem to generate any desired beam behavior is formulated and near-optimal algorithms to the problem are proposed. Simulations show that the proposed algorithms can outperform heuristics solutions for JPTA beamformer update. Furthermore, it is shown that JPTA can achieve the two exemplified beam behaviors with one radio-frequency chain, while conventional hybrid beamforming requires the radio-frequency chains to scale with the number of antennas to achieve similar performance. Finally, a wide range of problems to further tap into the potential of JPTA are also listed as future directions. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: The paper is a revised version of the IEEE Access paper, that includes the full operation of Algorithms 1-3 to help curtail incorrect implementations

Journal ref: IEEE Access, vol. 10, pp. 73364-73377, 2022

arXiv:2310.11805 [pdf, other]

doi 10.1109/ROBIO58561.2023.10354990

GMC-Pos: Graph-Based Multi-Robot Coverage Positioning Method

Authors: Khattiya Pongsiri**da, Zhiqiang Cao, Muhammad Shalihan, Benny Kai Kiat Ng, Billy Pik Lik Lau, Chau Yuen, U-Xuan Tan

Abstract: Nowadays, several real-world tasks require adequate environment coverage for maintaining communication between multiple robots, for example, target search tasks, environmental monitoring, and post-disaster rescues. In this study, we look into a situation where there are a human operator and multiple robots, and we assume that each human or robot covers a certain range of areas. We want them to max… ▽ More Nowadays, several real-world tasks require adequate environment coverage for maintaining communication between multiple robots, for example, target search tasks, environmental monitoring, and post-disaster rescues. In this study, we look into a situation where there are a human operator and multiple robots, and we assume that each human or robot covers a certain range of areas. We want them to maximize their area of coverage collectively. Therefore, in this paper, we propose the Graph-Based Multi-Robot Coverage Positioning Method (GMC-Pos) to find strategic positions for robots that maximize the area coverage. Our novel approach consists of two main modules: graph generation and node selection. Firstly, graph generation represents the environment using a weighted connected graph. Then, we present a novel generalized graph-based distance and utilize it together with the graph degrees to be the conditions for node selection in a recursive manner. Our method is deployed in three environments with different settings. The results show that it outperforms the benchmark method by 15.13% to 24.88% regarding the area coverage percentage. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: This paper has been accepted by the 2023 IEEE International Conference on Robotics and Biomimetics (IEEE ROBIO 2023)

arXiv:2310.09609 [pdf, other]

Towards Intelligent Network Management: Leveraging AI for Network Service Detection

Authors: Khuong N. Nguyen, Abhishek Sehgal, Yuming Zhu, Junsu Choi, Guanbo Chen, Hao Chen, Boon Loong Ng, Charlie Zhang

Abstract: As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels i… ▽ More As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels in identifying various network service types in real-time, by analyzing patterns within the network traffic. Our method organizes similar kinds of network traffic into distinct categories, referred to as network services, based on latency requirement. Furthermore, it decomposes the network traffic stream into multiple, smaller traffic flows, with each flow uniquely carrying a specific service. Our ML models are trained on a dataset comprised of labeled examples representing different network service types collected on various Wi-Fi network conditions. Upon evaluation, our system demonstrates a remarkable accuracy in distinguishing the network services. These results emphasize the substantial promise of integrating Artificial Intelligence in wireless technologies. Such an approach encourages more efficient energy consumption, enhances Quality of Service assurance, and optimizes the allocation of network resources, thus laying a solid groundwork for the development of advanced intelligent networks. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2305.12261 [pdf, ps, other]

MIMO Asynchronous MAC with Faster-than-Nyquist (FTN) Signaling

Authors: Zichao Zhang, Melda Yuksel, Halim Yanikomeroglu, Benjamin K. Ng, Chan-Tong Lam

Abstract: Faster-than-Nyquist (FTN) signaling is a nonorthogonal transmission technique, which brings in intentional inter-symbol interference. This way it can significantly enhance spectral efficiency for practical pulse shapes such as the root raised cosine pulses. This paper proposes an achievable rate region for the multiple antenna (MIMO) asynchronous multiple access channel (aMAC) with FTN signaling.… ▽ More Faster-than-Nyquist (FTN) signaling is a nonorthogonal transmission technique, which brings in intentional inter-symbol interference. This way it can significantly enhance spectral efficiency for practical pulse shapes such as the root raised cosine pulses. This paper proposes an achievable rate region for the multiple antenna (MIMO) asynchronous multiple access channel (aMAC) with FTN signaling. The scheme applies waterfilling in the spatial domain and precoding in time. Waterfilling in space provides better power allocation and precoding helps mitigate inter-symbol interference due to asynchronous transmission and FTN. The results show that the gains due to asynchronous transmission and FTN are more emphasized in MIMO aMAC than in single antenna aMAC. Moreover, FTN improves single-user rates, and asynchronous transmission improves the sum-rate, due to better inter-user interference management. △ Less

Submitted 20 May, 2023; originally announced May 2023.

arXiv:2304.05599 [pdf, other]

Bit-Interleaved Multiple Access: Improved Fairness, Reliability, and Latency for Massive IoT Networks

Authors: Ferdi Kara, Hakan Kaya, Halim Yanikomeroglu, Chan-Tong Lam, Ben K. Ng

Abstract: In this paper, we propose bit-interleaved multiple access (BIMA) to enable Internet-of-Things (IoT) networks where a massive connection is required with limited resource blocks. First, by providing a true power allocation (PA) constraint for conventional NOMA with practical constraints, we demonstrate that it cannot support massive connections. To this end, we propose BIMA where there are no stric… ▽ More In this paper, we propose bit-interleaved multiple access (BIMA) to enable Internet-of-Things (IoT) networks where a massive connection is required with limited resource blocks. First, by providing a true power allocation (PA) constraint for conventional NOMA with practical constraints, we demonstrate that it cannot support massive connections. To this end, we propose BIMA where there are no strict PA constraints, unlike conventional NOMA, thus allowing a high number of devices. We provide a comprehensive analytical framework for BIMA for all key performance indicators (KPIs) (i.e., ergodic capacity [EC], outage probability [OP], and bit error rate [BER]). We evaluate Jain's fairness index and proportional fairness index in terms of all KPIs. Based on the extensive computer simulations, we reveal that BIMA outperforms conventional NOMA significantly, with a performance gain of up to 20-30dB. This performance gain becomes greater when more devices are supported. BIMA provides a full diversity order and enables the implementation of an arbitrary number of devices and modulation orders, which is crucial for IoT networks in dense areas. BIMA guarantees a fairness system where none of the devices gets a severe performance and the sum-rate is shared in a fair manner among devices by guarantying QoS satisfaction. Finally, we provide an intense complexity and latency analysis and demonstrate that BIMA provides lower latency compared to conventional NOMA since it allows parallel computing at the receivers and no iterative operations are required. We show that BIMA reduces latency by up to 350\% for specific devices and 170\% on average. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: accepted in IEEE

arXiv:2304.00338 [pdf, other]

Scientific Computing Algorithms to Learn Enhanced Scalable Surrogates for Mesh Physics

Authors: Brian R. Bartoldson, Ye** Hu, Amar Saini, Jose Cadena, Yucheng Fu, Jie Bao, Zhijie Xu, Brenda Ng, Phan Nguyen

Abstract: Data-driven modeling approaches can produce fast surrogates to study large-scale physics problems. Among them, graph neural networks (GNNs) that operate on mesh-based data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a cl… ▽ More Data-driven modeling approaches can produce fast surrogates to study large-scale physics problems. Among them, graph neural networks (GNNs) that operate on mesh-based data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a class of GNN surrogates on 3D meshes. We scale MeshGraphNets (MGN), a subclass of GNNs for mesh-based physics modeling, via our domain decomposition approach to facilitate training that is mathematically equivalent to training on the whole domain under certain conditions. With this, we were able to train MGN on meshes with \textit{millions} of nodes to generate computational fluid dynamics (CFD) simulations. Furthermore, we show how to enhance MGN via higher-order numerical integration, which can reduce MGN's error and training time. We validated our methods on an accompanying dataset of 3D $\text{CO}_2$-capture CFD simulations on a 3.1M-node mesh. This work presents a practical path to scaling MGN for real-world applications. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: ICLR 2023 Workshop on Physics for Machine Learning

arXiv:2302.12666 [pdf, other]

Modelling Temporal Document Sequences for Clinical ICD Coding

Authors: Clarence Boon Liang Ng, Diogo Santos, Marek Rei

Abstract: Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical no… ▽ More Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical notes in each hospital stay for ICD coding, and incorporates embeddings for text metadata such as their position, time, and type of note. While using all clinical notes increases the quantity of data substantially, superconvergence can be used to reduce training costs. We evaluate the model on the MIMIC-III dataset. Our model exceeds the prior state-of-the-art when using only discharge summaries as input, and achieves further performance improvements when all clinical notes are used as input. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2301.11272 [pdf, other]

Location-based Activity Behavior Deviation Detection for Nursing Home using IoT Devices

Authors: Billy Pik Lik Lau, Zann Koh, Yuren Zhou, Benny Kai Kiat Ng, Chau Yuen, Mui Lang Low

Abstract: With the advancement of the Internet of Things(IoT) and pervasive computing applications, it provides a better opportunity to understand the behavior of the aging population. However, in a nursing home scenario, common sensors and techniques used to track an elderly living alone are not suitable. In this paper, we design a location-based tracking system for a four-story nursing home - The Salvatio… ▽ More With the advancement of the Internet of Things(IoT) and pervasive computing applications, it provides a better opportunity to understand the behavior of the aging population. However, in a nursing home scenario, common sensors and techniques used to track an elderly living alone are not suitable. In this paper, we design a location-based tracking system for a four-story nursing home - The Salvation Army, Peacehaven Nursing Home in Singapore. The main challenge here is to identify the group activity among the nursing home's residents and to detect if they have any deviated activity behavior. We propose a location-based deviated activity behavior detection system to detect deviated activity behavior by leveraging data fusion technique. In order to compute the features for data fusion, an adaptive method is applied for extracting the group and individual activity time and generate daily hybrid norm for each of the residents. Next, deviated activity behavior detection is executed by considering the difference between daily norm patterns and daily input data for each resident. Lastly, the deviated activity behavior among the residents are classified using a rule-based classification approach. Through the implementation, there are 44.4% of the residents do not have deviated activity behavior , while 37% residents involved in one deviated activity behavior and 18.6% residents have two or more deviated activity behaviors. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 12 pages

arXiv:2210.05954 [pdf, other]

doi 10.1016/j.compbiomed.2023.107277

Image Projective Transformation Rectification with Synthetic Data for Smartphone-captured Chest X-ray Photos Classification

Authors: Chak Fong Chong, Yapeng Wang, Benjamin Ng, Wuman Luo, Xu Yang

Abstract: Classification on smartphone-captured chest X-ray (CXR) photos to detect pathologies is challenging due to the projective transformation caused by the non-ideal camera position. Recently, various rectification methods have been proposed for different photo rectification tasks such as document photos, license plate photos, etc. Unfortunately, we found that none of them is suitable for CXR photos, d… ▽ More Classification on smartphone-captured chest X-ray (CXR) photos to detect pathologies is challenging due to the projective transformation caused by the non-ideal camera position. Recently, various rectification methods have been proposed for different photo rectification tasks such as document photos, license plate photos, etc. Unfortunately, we found that none of them is suitable for CXR photos, due to their specific transformation type, image appearance, annotation type, etc. In this paper, we propose an innovative deep learning-based Projective Transformation Rectification Network (PTRN) to automatically rectify CXR photos by predicting the projective transformation matrix. To the best of our knowledge, it is the first work to predict the projective transformation matrix as the learning goal for photo rectification. Additionally, to avoid the expensive collection of natural data, synthetic CXR photos are generated under the consideration of natural perturbations, extra screens, etc. We evaluate the proposed approach in the CheXphoto smartphone-captured CXR photos classification competition hosted by the Stanford University Machine Learning Group, our approach won first place with a huge performance improvement (ours 0.850, second-best 0.762, in AUC). A deeper study demonstrates that the use of PTRN successfully achieves the classification performance on the spatially transformed CXR photos to the same level as on the high-quality digital CXR images, indicating PTRN can eliminate all negative impacts of projective transformation on the CXR photos. △ Less

Submitted 30 November, 2022; v1 submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.01959 [pdf, other]

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

Authors: Tavish McDonald, Brian Tsan, Amar Saini, Juanita Ordonez, Luis Gutierrez, Phan Nguyen, Blake Mason, Brenda Ng

Abstract: Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA s… ▽ More Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA. △ Less

Submitted 11 December, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2210.00191 [pdf, other]

Cut-Paste Consistency Learning for Semi-Supervised Lesion Segmentation

Authors: Boon Peng Yap, Beng Koon Ng

Abstract: Semi-supervised learning has the potential to improve the data-efficiency of training data-hungry deep neural networks, which is especially important for medical image analysis tasks where labeled data is scarce. In this work, we present a simple semi-supervised learning method for lesion segmentation tasks based on the ideas of cut-paste augmentation and consistency regularization. By exploiting… ▽ More Semi-supervised learning has the potential to improve the data-efficiency of training data-hungry deep neural networks, which is especially important for medical image analysis tasks where labeled data is scarce. In this work, we present a simple semi-supervised learning method for lesion segmentation tasks based on the ideas of cut-paste augmentation and consistency regularization. By exploiting the mask information available in the labeled data, we synthesize partially labeled samples from the unlabeled images so that the usual supervised learning objective (e.g., binary cross entropy) can be applied. Additionally, we introduce a background consistency term to regularize the training on the unlabeled background regions of the synthetic images. We empirically verify the effectiveness of the proposed method on two public lesion segmentation datasets, including an eye fundus photograph dataset and a brain CT scan dataset. The experiment results indicate that our method achieves consistent and superior performance over other self-training and consistency-based methods without introducing sophisticated network components. △ Less

Submitted 1 October, 2022; originally announced October 2022.

Comments: Accepted to appear in WACV 2023

arXiv:2203.04516 [pdf, other]

Update Compression for Deep Neural Networks on the Edge

Authors: Bo Chen, Ali Bakhshi, Gustavo Batista, Brian Ng, Tat-Jun Chin

Abstract: An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the ser… ▽ More An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the server side based on a copy of the DNN model, with only the necessary data transmitted to the edge to update the deployed model. However, due to bandwidth constraints, we want to minimise the transmission required to achieve the update. We develop a simple approach based on matrix factorisation to compress the model update -- this differs from compressing the model itself. The key idea is to preserve existing knowledge in the current model and optimise only small additional parameters for the update which can be used to reconstitute the model on the edge. We compared our method to similar techniques used in federated learning; our method usually requires less than half of the update size of existing methods to achieve the same accuracy. △ Less

Submitted 21 April, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: CVPR 2022 Mobile AI Workshop

arXiv:2202.02247 [pdf, other]

Beam Management with Orientation and RSRP using Deep Learning for Beyond 5G Systems

Authors: Khuong N. Nguyen, Anum Ali, Jianhua Mo, Boon Loong Ng, Vutha Va, Jianzhong Charlie Zhang

Abstract: Beam management (BM), i.e., the process of finding and maintaining a suitable transmit and receive beam pair, can be challenging, particularly in highly dynamic scenarios. Side-information, e.g., orientation, from on-board sensors can assist the user equipment (UE) BM. In this work, we use the orientation information coming from the inertial measurement unit (IMU) for effective BM. We use a data-d… ▽ More Beam management (BM), i.e., the process of finding and maintaining a suitable transmit and receive beam pair, can be challenging, particularly in highly dynamic scenarios. Side-information, e.g., orientation, from on-board sensors can assist the user equipment (UE) BM. In this work, we use the orientation information coming from the inertial measurement unit (IMU) for effective BM. We use a data-driven strategy that fuses the reference signal received power (RSRP) with orientation information using a recurrent neural network (RNN). Simulation results show that the proposed strategy performs much better than the conventional BM and an orientation-assisted BM strategy that utilizes particle filter in another study. Specifically, the proposed data-driven strategy improves the beam-prediction accuracy up to 34% and increases mean RSRP by up to 4.2 dB when the UE orientation changes quickly. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2112.12296 [pdf, other]

Sub-Chain Beam for mmWave Devices: A Trade-off between Power Saving and Beam Correspondence

Authors: Jianhua Mo, Daehee Park, Boon Loong Ng, Vutha Va, Anum Ali, Chonghwa Seo, Jianzhong Charlie Zhang

Abstract: Beam correspondence, or downlink-uplink (DL-UL) beam reciprocity, refers to the assumption that the best beams in the DL are also the best beams in the UL. This is an important assumption that allows the existing beam management framework in 5G to rely heavily on DL beam swee** and avoid UL beam swee**: UL beams are inferred from the measurements of the DL reference signals. Beam correspondenc… ▽ More Beam correspondence, or downlink-uplink (DL-UL) beam reciprocity, refers to the assumption that the best beams in the DL are also the best beams in the UL. This is an important assumption that allows the existing beam management framework in 5G to rely heavily on DL beam swee** and avoid UL beam swee**: UL beams are inferred from the measurements of the DL reference signals. Beam correspondence holds when the radio configurations are symmetric in the DL and UL. However, as mmWave technology matures, the DL and the UL face different constraints often breaking the beam correspondence. For example, power constraints may require a UE to activate only a portion of its antenna array for UL transmission, while still activating the full array for DL reception. Meanwhile, if the UL beam with sub-array, named as sub-chain beam in this paper, has a similar radiation pattern as the DL beam, the beam correspondence can still hold. This paper proposes methods for sub-chain beam codebook design to achieve a trade-off between the power saving and beam correspondence. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: 6 pages, 7 figures, accepted by Asilomar conference 2021

arXiv:2112.11656 [pdf, other]

Latent Space Simulation for Carbon Capture Design Optimization

Authors: Brian Bartoldson, Rui Wang, Yucheng Fu, David Widemann, Sam Nguyen, Jie Bao, Zhijie Xu, Brenda Ng

Abstract: The CO2 capture efficiency in solvent-based carbon capture systems (CCSs) critically depends on the gas-solvent interfacial area (IA), making maximization of IA a foundational challenge in CCS design. While the IA associated with a particular CCS design can be estimated via a computational fluid dynamics (CFD) simulation, using CFD to derive the IAs associated with numerous CCS designs is prohibit… ▽ More The CO2 capture efficiency in solvent-based carbon capture systems (CCSs) critically depends on the gas-solvent interfacial area (IA), making maximization of IA a foundational challenge in CCS design. While the IA associated with a particular CCS design can be estimated via a computational fluid dynamics (CFD) simulation, using CFD to derive the IAs associated with numerous CCS designs is prohibitively costly. Fortunately, previous works such as Deep Fluids (DF) (Kim et al., 2019) show that large simulation speedups are achievable by replacing CFD simulators with neural network (NN) surrogates that faithfully mimic the CFD simulation process. This raises the possibility of a fast, accurate replacement for a CFD simulator and therefore efficient approximation of the IAs required by CCS design optimization. Thus, here, we build on the DF approach to develop surrogates that can successfully be applied to our complex carbon-capture CFD simulations. Our optimized DF-style surrogates produce large speedups (4000x) while obtaining IA relative errors as low as 4% on unseen CCS configurations that lie within the range of training configurations. This hints at the promise of NN surrogates for our CCS design optimization problem. Nonetheless, DF has inherent limitations with respect to CCS design (e.g., limited transferability of trained models to new CCS packings). We conclude with ideas to address these challenges. △ Less

Submitted 21 December, 2021; originally announced December 2021.

Comments: Extended version of a paper appearing in the Proceedings of the 34th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-22)

arXiv:2108.02122 [pdf, other]

Semi-weakly Supervised Contrastive Representation Learning for Retinal Fundus Images

Authors: Boon Peng Yap, Beng Koon Ng

Abstract: We explore the value of weak labels in learning transferable representations for medical images. Compared to hand-labeled datasets, weak or inexact labels can be acquired in large quantities at significantly lower cost and can provide useful training signals for data-hungry models such as deep neural networks. We consider weak labels in the form of pseudo-labels and propose a semi-weakly supervise… ▽ More We explore the value of weak labels in learning transferable representations for medical images. Compared to hand-labeled datasets, weak or inexact labels can be acquired in large quantities at significantly lower cost and can provide useful training signals for data-hungry models such as deep neural networks. We consider weak labels in the form of pseudo-labels and propose a semi-weakly supervised contrastive learning (SWCL) framework for representation learning using semi-weakly annotated images. Specifically, we train a semi-supervised model to propagate labels from a small dataset consisting of diverse image-level annotations to a large unlabeled dataset. Using the propagated labels, we generate a patch-level dataset for pretraining and formulate a multi-label contrastive learning objective to capture position-specific features encoded in each patch. We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets, covering three disease classification tasks and two anatomical structure segmentation tasks. Our experiment results suggest that, under very low data regime, large-scale ImageNet pretraining on improved architecture remains a very strong baseline, and recently proposed self-supervised methods falter in segmentation tasks, possibly due to the strong invariant constraint imposed. Our method surpasses all prior self-supervised methods and standard cross-entropy training, while closing the gaps with ImageNet pretraining. △ Less

Submitted 4 August, 2021; originally announced August 2021.

arXiv:2107.08842 [pdf, other]

Relative Localization of Mobile Robots with Multiple Ultra-WideBand Ranging Measurements

Authors: Zhiqiang Cao, Ran Liu, Chau Yuen, Achala Athukorala, Benny Kai Kiat Ng, Muraleetharan Mathanraj, U-Xuan Tan

Abstract: Relative localization between autonomous robots without infrastructure is crucial to achieve their navigation, path planning, and formation in many applications, such as emergency response, where acquiring a prior knowledge of the environment is not possible. The traditional Ultra-WideBand (UWB)-based approach provides a good estimation of the distance between the robots, but obtaining the relativ… ▽ More Relative localization between autonomous robots without infrastructure is crucial to achieve their navigation, path planning, and formation in many applications, such as emergency response, where acquiring a prior knowledge of the environment is not possible. The traditional Ultra-WideBand (UWB)-based approach provides a good estimation of the distance between the robots, but obtaining the relative pose (including the displacement and orientation) remains challenging. We propose an approach to estimate the relative pose between a group of robots by equip** each robot with multiple UWB ranging nodes. We determine the pose between two robots by minimizing the residual error of the ranging measurements from all UWB nodes. To improve the localization accuracy, we propose to utilize the odometry constraints through a sliding window-based optimization. The optimized pose is then fused with the odometry in a particle filtering for pose tracking among a group of mobile robots. We have conducted extensive experiments to validate the effectiveness of the proposed approach. △ Less

Submitted 30 July, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: Accepted by the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021), Prague, Czech Republic

arXiv:2107.08579 [pdf, other]

Action Forecasting with Feature-wise Self-Attention

Authors: Yan Bin Ng, Basura Fernando

Abstract: We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary… ▽ More We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary classifier which helps to understand what has happened so far. Then the decoder generates actions for the future based on the output of the recurrent encoder and the self-attention model. Experimentally, we validate each component of our architecture where we see that the impact of self-attention to identify relevant feature dimensions, temporal masking, and observed auxiliary classifier. We evaluate our method on two standard action forecasting benchmarks and obtain state-of-the-art results. △ Less

Submitted 18 July, 2021; originally announced July 2021.

arXiv:2101.03725 [pdf, other]

doi 10.1109/JIOT.2021.3051343

The Study of Urban Residential's Public Space Activeness using Space-centric Approach

Authors: Billy Pik Lik Lau, Benny Kai Kiat Ng, Chau Yuen, Bige Tuncer, Keng Hua Chong

Abstract: With the advancement of the Internet of Things (IoT) and communication platform, large scale sensor deployment can be easily implemented in an urban city to collect various information. To date, there are only a handful of research studies about understanding the usage of urban public spaces. Leveraging IoT, various sensors have been deployed in an urban residential area to monitor and study publi… ▽ More With the advancement of the Internet of Things (IoT) and communication platform, large scale sensor deployment can be easily implemented in an urban city to collect various information. To date, there are only a handful of research studies about understanding the usage of urban public spaces. Leveraging IoT, various sensors have been deployed in an urban residential area to monitor and study public space utilization patterns. In this paper, we propose a data processing system to generate space-centric insights about the utilization of an urban residential region of multiple points of interest (PoIs) that consists of 190,000m$^2$ real estate. We identify the activeness of each PoI based on the spectral clustering, and then study their corresponding static features, which are composed of transportation, commercial facilities, population density, along with other characteristics. Through the heuristic features inferring, the residential density and commercial facilities are the most significant factors affecting public place utilization. △ Less

Submitted 11 January, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted at IEEE Internet of Things Journal 2021

arXiv:2012.05488 [pdf, other]

doi 10.1109/JSYST.2020.3044325

Urban Space Insights Extraction using Acoustic Histogram Information

Authors: Nipun Wijerathne, Billy Pik Lik Lau, Benny Kai Kiat Ng, Chau Yuen

Abstract: Urban data mining can be identified as a highly potential area that can enhance the smart city services towards better sustainable development especially in the urban residential activity tracking. While existing human activity tracking systems have demonstrated the capability to unveil the hidden aspects of citizens' behavior, they often come with a high implementation cost and require a large co… ▽ More Urban data mining can be identified as a highly potential area that can enhance the smart city services towards better sustainable development especially in the urban residential activity tracking. While existing human activity tracking systems have demonstrated the capability to unveil the hidden aspects of citizens' behavior, they often come with a high implementation cost and require a large communication bandwidth. In this paper, we study the implementation of low-cost analogue sound sensors to detect outdoor activities and estimate the raining period in an urban residential area. The analogue sound sensors are transmitted to the cloud every 5 minutes in histogram format, which consists of sound data sampled every 100ms (10Hz). We then use wavelet transformation (WT) and principal component analysis (PCA) to generate a more robust and consistent feature set from the histogram. After that, we performed unsupervised clustering and attempt to understand the individual characteristics of each cluster to identify outdoor residential activities. In addition, on-site validation has been conducted to show the effectiveness of our approach. △ Less

Submitted 14 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

Comments: Accepted at IEEE Systems Journal

arXiv:2004.05234 [pdf, other]

Attend and Decode: 4D fMRI Task State Decoding Using Attention Models

Authors: Sam Nguyen, Brenda Ng, Alan D. Kaplan, Priyadip Ray

Abstract: Functional magnetic resonance imaging (fMRI) is a neuroimaging modality that captures the blood oxygen level in a subject's brain while the subject either rests or performs a variety of functional tasks under different conditions. Given fMRI data, the problem of inferring the task, known as task state decoding, is challenging due to the high dimensionality (hundreds of million sampling points per… ▽ More Functional magnetic resonance imaging (fMRI) is a neuroimaging modality that captures the blood oxygen level in a subject's brain while the subject either rests or performs a variety of functional tasks under different conditions. Given fMRI data, the problem of inferring the task, known as task state decoding, is challenging due to the high dimensionality (hundreds of million sampling points per datum) and complex spatio-temporal blood flow patterns inherent in the data. In this work, we propose to tackle the fMRI task state decoding problem by casting it as a 4D spatio-temporal classification problem. We present a novel architecture called Brain Attend and Decode (BAnD), that uses residual convolutional neural networks for spatial feature extraction and self-attention mechanisms for temporal modeling. We achieve significant performance gain compared to previous works on a 7-task benchmark from the large-scale Human Connectome Project-Young Adult (HCP-YA) dataset. We also investigate the transferability of BAnD's extracted features on unseen HCP tasks, either by freezing the spatial feature extraction layers and retraining the temporal model, or finetuning the entire model. The pre-trained features from BAnD are useful on similar tasks while finetuning them yields competitive results on unseen tasks/conditions. △ Less

Submitted 19 January, 2021; v1 submitted 10 April, 2020; originally announced April 2020.

Journal ref: Proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 136:267-279, 2020

arXiv:2003.03785 [pdf, ps, other]

Dependently Typed Knowledge Graphs

Authors: Zhangsheng Lai, Aik Beng Ng, Liang Ze Wong, Simon See, Shaowei Lin

Abstract: Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its q… ▽ More Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its query language SPARQL) can be reproduced in a unified manner with dependent type theory. In addition to providing the basic functionalities of knowledge graphs, dependent types add expressiveness in encoding both entities and queries, explainability in answers to queries through witnesses, and compositionality and automation in the construction of witnesses. Using the Coq proof assistant, we demonstrate how to build and query dependently typed knowledge graphs as a proof of concept for future works in this direction. △ Less

Submitted 8 March, 2020; originally announced March 2020.

arXiv:2003.02449 [pdf, ps, other]

doi 10.1109/JSTSP.2020.2971418

Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications

Authors: Chinthaka Gamanayake, Lahiru Jayasinghe, Benny Ng, Chau Yuen

Abstract: Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quan… ▽ More Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quantization are used to overcome the above mentioned problem. Even though filter pruning methodology has shown better performances compared to other techniques, irregularity of the number of filters pruned across different layers of a CNN might not comply with majority of the neural computing hardware architectures. In this paper, a novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN by considering the importance of filters and the underlying hardware architecture. The proposed methodology is compared with the conventional filter pruning algorithm on Pascal-VOC open dataset, and Head-Counting dataset, which is our own dataset developed to detect and count people entering a room. We benchmark our proposed method on three hardware architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS) using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures used for edge-AI vision applications. Results demonstrate that our method outperforms the conventional filter pruning methodology, using both datasets on above mentioned hardware architectures. Furthermore, a low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Journal ref: J-STSP-CDNN-00206-2019

arXiv:2002.04401 [pdf, other]

Understanding Crowd Behaviors in a Social Event by Passive WiFi Sensing and Data Mining

Authors: Yuren Zhou, Billy Pik Lik Lau, Zann Koh, Chau Yuen, Benny Kai Kiat Ng

Abstract: Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough atten… ▽ More Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough attention has been paid to the thorough analysis and mining of collected data. Especially, the power of machine learning has not been fully exploited. In this paper, therefore, we propose a comprehensive data analysis framework to fully analyze the collected probe requests to extract three types of patterns related to crowd behaviors in a large social event, with the help of statistics, visualization, and unsupervised machine learning. First, trajectories of the mobile devices are extracted from probe requests and analyzed to reveal the spatial patterns of the crowds' movement. Hierarchical agglomerative clustering is adopted to find the interconnections between different locations. Next, k-means and k-shape clustering algorithms are applied to extract temporal visiting patterns of the crowds by days and locations, respectively. Finally, by combining with time, trajectories are transformed into spatiotemporal patterns, which reveal how trajectory duration changes over the length and how the overall trends of crowd movement change over time. The proposed data analysis framework is fully demonstrated using real-world data collected in a large social event. Results show that one can extract comprehensive patterns from data collected by a network of passive WiFi sensors. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: This manuscript has been accepted by IEEE Internet of Things journal. Copyright (c) 2020 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]

arXiv:1912.04608 [pdf, other]

doi 10.1109/TIP.2020.3021497

Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting

Authors: Yan Bin Ng, Basura Fernando

Abstract: Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, a… ▽ More Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, and the objective is to forecast the correct future symbolic action sequence. Unlike prior methods that make action predictions for some unseen percentage of video one for each frame, we predict the complete action sequence that is required to accomplish the activity. We coin this task action sequence forecasting. To cater for two types of uncertainty in the future predictions, we propose a novel loss function. We show a combination of optimal transport and future uncertainty losses help to improve results. We extend our action sequence forecasting model to perform weakly supervised action forecasting on two challenging datasets, the Breakfast and the 50Salads. Specifically, we propose a model to predict actions of future unseen frames without using frame level annotations during training. Using Fisher vector features, our supervised model outperforms the state-of-the-art action forecasting model by 0.83% and 7.09% on the Breakfast and the 50Salads datasets respectively. Our weakly supervised model is only 0.6% behind the most recent state-of-the-art supervised model and obtains comparable results to other published fully supervised methods, and sometimes even outperforms them on the Breakfast dataset. Most interestingly, our weakly supervised model outperforms prior models by 1.04% leveraging on proposed weakly supervised architecture, and effective use of attention mechanism and loss functions. △ Less

Submitted 3 February, 2022; v1 submitted 10 December, 2019; originally announced December 2019.

Journal ref: in IEEE Transactions on Image Processing, vol. 29, pp. 8880-8891, 2020

arXiv:1910.02602 [pdf, other]

Human Action Sequence Classification

Authors: Yan Bin Ng, Basura Fernando

Abstract: This paper classifies human action sequences from videos using a machine translation model. In contrast to classical human action classification which outputs a set of actions, our method output a sequence of action in the chronological order of the actions performed by the human. Therefore our method is evaluated using sequential performance measures such as Bilingual Evaluation Understudy (BLEU)… ▽ More This paper classifies human action sequences from videos using a machine translation model. In contrast to classical human action classification which outputs a set of actions, our method output a sequence of action in the chronological order of the actions performed by the human. Therefore our method is evaluated using sequential performance measures such as Bilingual Evaluation Understudy (BLEU) scores. Action sequence classification has many applications such as learning from demonstration, action segmentation, detection, localization and video captioning. Furthermore, we use our model that is trained to output action sequences to solve downstream tasks; such as video captioning and action localization. We obtain state of the art results for video captioning in challenging Charades dataset obtaining BLEU-4 score of 34.8 and METEOR score of 33.6 outperforming previous state-of-the-art of 18.8 and 19.5 respectively. Similarly, on ActivityNet captioning, we obtain excellent results in-terms of ROUGE (20.24) and CIDER (37.58) scores. For action localization, without using any explicit start/end action annotations, our method obtains localization performance of 22.2 mAP outperforming prior fully supervised methods. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1908.01004 [pdf, other]

doi 10.1109/ACCESS.2019.2930224

Beam Codebook Design for 5G mmWave Terminals

Authors: Jianhua Mo, Boon Loong Ng, Sanghyun Chang, Pengda Huang, Mandar Kulkarni, Ahmad AlAmmouri, Jianzhong Charlie Zhang, Jeongheum Lee, Won-Joon Choi

Abstract: A beam codebook of 5G millimeter wave (mmWave) for data communication consists of multiple high-peak-gain beams to compensate the high pathloss at the mmWave bands. These beams also have to point to different angular directions, such that by performing beam searching over the codebook, a good mmWave signal coverage over the full sphere around the terminal (spherical coverage) can be achieved. A mo… ▽ More A beam codebook of 5G millimeter wave (mmWave) for data communication consists of multiple high-peak-gain beams to compensate the high pathloss at the mmWave bands. These beams also have to point to different angular directions, such that by performing beam searching over the codebook, a good mmWave signal coverage over the full sphere around the terminal (spherical coverage) can be achieved. A model-based beam codebook design that assumes ideal omni-directional antenna pattern, and neglects the impact of terminal housing around the antenna, does not work well because the radiation pattern of a practical mmWave antenna combined with the impact of terminal housing is highly irregular. In this paper, we propose a novel and efficient data-driven method to generate a beam codebook to boost the spherical coverage of mmWave terminals. The method takes as inputs the measured or simulated electric field response data of each antenna and provides the codebook according to the requirements on the codebook size, spherical coverage, etc. The method can be applied in a straightforward manner to different antenna type, antenna array configuration, placement and terminal housing design. Our simulation results show that the proposed method generates a codebook better than the benchmark and 802.15.3c codebooks in terms of the spherical coverage. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: 17 pages, 12 figures. Published by IEEE Access

arXiv:1908.00850 [pdf, other]

Grip-Aware Analog mmWave Beam Codebook Adaptation for 5G Mobile Handsets

Authors: Ahmad AlAmmouri, Jianhua Mo, Boon Loong Ng, Jianzhong Charlie Zhang, Jeffrey G. Andrews

Abstract: This paper studies the effect of the user hand grip on the design of beamforming codebooks for 5G millimeter-wave (mmWave) mobile handsets. The high-frequency structure simulator (HFSS) is used to characterize the radiation fields for fourteen possible handgrip profiles based on experiments we conducted. The loss from hand blockage on the antenna gains can be up to 20-25 dB, which implies that the… ▽ More This paper studies the effect of the user hand grip on the design of beamforming codebooks for 5G millimeter-wave (mmWave) mobile handsets. The high-frequency structure simulator (HFSS) is used to characterize the radiation fields for fourteen possible handgrip profiles based on experiments we conducted. The loss from hand blockage on the antenna gains can be up to 20-25 dB, which implies that the possible hand grip profiles need to be taken into account while designing beam codebooks. Specifically, we consider three different codebook adaption schemes: a grip-aware scheme, where perfect knowledge of the hand grip is available; a semi-aware scheme, where just the application (voice call, messaging, etc.) and the orientation of the mobile handset is known; and a grip-agnostic scheme, where the codebook ignores hand blockage. Our results show that the ideal grip-aware scheme can provide more than 50% gain in terms of the spherical coverage over the agnostic scheme, depending on the grip and orientation. Encouragingly, the more practical semi-aware scheme we propose provides performance approaching the fully grip-aware scheme. Overall, we demonstrate that 5G mmWave handsets are different from pre-5G handsets: the user grip needs to be explicitly factored into the codebook design. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: GLOBECOM 2019

arXiv:1710.01581 [pdf, other]

doi 10.1109/JIOT.2017.2748987

Sensor Fusion for Public Space Utilization Monitoring in a Smart City

Authors: Billy Pik Lik Lau, Nipun Wijerathne, Benny Kai Kiat Ng, and Chau Yuen

Abstract: Public space utilization is crucial for urban developers to understand how efficient a place is being occupied in order to improve existing or future infrastructures. In a smart cities approach, implementing public space monitoring with Internet-of-Things (IoT) sensors appear to be a viable solution. However, choice of sensors often is a challenging problem and often linked with scalability, cover… ▽ More Public space utilization is crucial for urban developers to understand how efficient a place is being occupied in order to improve existing or future infrastructures. In a smart cities approach, implementing public space monitoring with Internet-of-Things (IoT) sensors appear to be a viable solution. However, choice of sensors often is a challenging problem and often linked with scalability, coverage, energy consumption, accuracy, and privacy. To get the most from low cost sensor with aforementioned design in mind, we proposed data processing modules for capturing public space utilization with Renewable Wireless Sensor Network (RWSN) platform using pyroelectric infrared (PIR) and analog sound sensor. We first proposed a calibration process to remove false alarm of PIR sensor due to the impact of weather and environment. We then demonstrate how the sounds sensor can be processed to provide various insight of a public space. Lastly, we fused both sensors and study a particular public space utilization based on one month data to unveil its usage. △ Less

Submitted 5 October, 2017; v1 submitted 14 September, 2017; originally announced October 2017.

arXiv:1704.00647 [pdf]

Distributed FD-MIMO: Cellular Evolution for 5G and Beyond

Authors: Yeqing Hu, Boon Loong Ng, Young-Han Nam, ** Yuan, Gary Xu, Ji-Yun Seol, Jianzhong, Zhang

Abstract: This paper presents the next evolution of FD-MIMO technology for beyond 5G, where antennas of the FD-MIMO system are placed in a distributed manner throughout the cell in a multi-cell deployment scenario. This system, referred to as Distributed FD-MIMO (D-FD-MIMO) system, is capable of providing higher cell average throughput as well as more uniform user experience compared to the conventional FD-… ▽ More This paper presents the next evolution of FD-MIMO technology for beyond 5G, where antennas of the FD-MIMO system are placed in a distributed manner throughout the cell in a multi-cell deployment scenario. This system, referred to as Distributed FD-MIMO (D-FD-MIMO) system, is capable of providing higher cell average throughput as well as more uniform user experience compared to the conventional FD-MIMO system. System level simulations are performed to evaluate performance. Our results show that the proposed D-FD-MIMO system achieves 1.4-2 times cell average throughput gain compared to the FD-MIMO system. The insights of performance gain are provided. Hardware implementation challenges and potential standards impact are also presented. △ Less

Submitted 3 April, 2017; originally announced April 2017.

arXiv:1604.08632 [pdf]

Licensed-Assisted Access to Unlicensed Spectrum in LTE Release 13

Authors: Hwan-Joon, Kwon, Jeongho Jeon, Abhijeet Bhorkar, Qiaoyang Ye, Hiroki Harada, Yu Jiang, Liu Liu, Satoshi Nagata, Boon Loong Ng, Thomas Novlan, **young Oh, Wang Yi

Abstract: Exploiting the unlicensed spectrum is considered by 3GPP as one promising solution to meet the ever-increasing traffic growth. As a result, one major enhancement for LTE in Release 13 has been to enable its operation in the unlicensed spectrum via Licensed-Assisted Access (LAA). In this article, we provide an overview of the Release 13 LAA technology including motivation, use cases, LTE enhancemen… ▽ More Exploiting the unlicensed spectrum is considered by 3GPP as one promising solution to meet the ever-increasing traffic growth. As a result, one major enhancement for LTE in Release 13 has been to enable its operation in the unlicensed spectrum via Licensed-Assisted Access (LAA). In this article, we provide an overview of the Release 13 LAA technology including motivation, use cases, LTE enhancements for enabling the unlicensed band operation, and the coexistence evaluation results contributed by 3GPP participants. △ Less

Submitted 28 April, 2016; originally announced April 2016.

arXiv:1602.05312 [pdf, other]

Density-based Denoising of Point Cloud

Authors: Faisal Zaman, Ya ** Wong, Boon Yian Ng

Abstract: Point cloud source data for surface reconstruction is usually contaminated with noise and outliers. To overcome this deficiency, a density-based point cloud denoising method is presented to remove outliers and noisy points. First, particle-swam optimization technique is employed for automatically approximating optimal bandwidth of multivariate kernel density estimation to ensure the robust perform… ▽ More Point cloud source data for surface reconstruction is usually contaminated with noise and outliers. To overcome this deficiency, a density-based point cloud denoising method is presented to remove outliers and noisy points. First, particle-swam optimization technique is employed for automatically approximating optimal bandwidth of multivariate kernel density estimation to ensure the robust performance of density estimation. Then, mean-shift based clustering technique is used to remove outliers through a thresholding scheme. After removing outliers from the point cloud, bilateral mesh filtering is applied to smooth the remaining points. The experimental results show that this approach, comparably, is robust and efficient. △ Less

Submitted 17 February, 2016; originally announced February 2016.

Comments: 9 pages, 5 figures, to be appeared in the Proceeding of 9th International Conference on Robotics, Vision, Signal Processing & Power Applications (ROVISP), 2-3 Feb 2016, Penang, Malaysia

arXiv:1510.04940 [pdf, other]

A Novel and Efficient Vector Quantization Based CPRI Compression Algorithm

Authors: Hongbo Si, Boon Loong Ng, Md. Saifur Rahman, Jianzhong, Zhang

Abstract: The future wireless network, such as Centralized Radio Access Network (C-RAN), will need to deliver data rate about 100 to 1000 times the current 4G technology. For C-RAN based network architecture, there is a pressing need for tremendous enhancement of the effective data rate of the Common Public Radio Interface (CPRI). Compression of CPRI data is one of the potential enhancements. In this paper,… ▽ More The future wireless network, such as Centralized Radio Access Network (C-RAN), will need to deliver data rate about 100 to 1000 times the current 4G technology. For C-RAN based network architecture, there is a pressing need for tremendous enhancement of the effective data rate of the Common Public Radio Interface (CPRI). Compression of CPRI data is one of the potential enhancements. In this paper, we introduce a vector quantization based compression algorithm for CPRI links, utilizing Lloyd algorithm. Methods to vectorize the I/Q samples and enhanced initialization of Lloyd algorithm for codebook training are investigated for improved performance. Multi-stage vector quantization and unequally protected multi-group quantization are considered to reduce codebook search complexity and codebook size. Simulation results show that our solution can achieve compression of 4 times for uplink and 4.5 times for downlink, within 2% Error Vector Magnitude (EVM) distortion. Remarkably, vector quantization codebook proves to be quite robust against data modulation mismatch, fading, signal-to-noise ratio (SNR) and Doppler spread. △ Less

Submitted 16 October, 2015; originally announced October 2015.

Comments: 25 pages, 15 figures, 6 tables, journal paper

arXiv:1301.7404 [pdf]

Resolving Conflicting Arguments under Uncertainties

Authors: Benson Hin Kwong Ng, Kam-Fai Wong, Boon-Toh Low

Abstract: Distributed knowledge based applications in open domain rely on common sense information which is bound to be uncertain and incomplete. To draw the useful conclusions from ambiguous data, one must address uncertainties and conflicts incurred in a holistic view. No integrated frameworks are viable without an in-depth analysis of conflicts incurred by uncertainties. In this paper, we give such an a… ▽ More Distributed knowledge based applications in open domain rely on common sense information which is bound to be uncertain and incomplete. To draw the useful conclusions from ambiguous data, one must address uncertainties and conflicts incurred in a holistic view. No integrated frameworks are viable without an in-depth analysis of conflicts incurred by uncertainties. In this paper, we give such an analysis and based on the result, propose an integrated framework. Our framework extends definite argumentation theory to model uncertainty. It supports three views over conflicting and uncertain knowledge. Thus, knowledge engineers can draw different conclusions depending on the application context (i.e. view). We also give an illustrative example on strategical decision support to show the practical usefulness of our framework. △ Less

Submitted 30 January, 2013; originally announced January 2013.

Comments: Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998)

Report number: UAI-P-1998-PG-414-421

arXiv:1301.0590 [pdf]

Factored Particles for Scalable Monitoring

Authors: Brenda Ng, Leonid Peshkin, Avi Pfeffer

Abstract: Exact monitoring in dynamic Bayesian networks is intractable, so approximate algorithms are necessary. This paper presents a new family of approximate monitoring algorithms that combine the best qualities of the particle filtering and Boyen-Koller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples… ▽ More Exact monitoring in dynamic Bayesian networks is intractable, so approximate algorithms are necessary. This paper presents a new family of approximate monitoring algorithms that combine the best qualities of the particle filtering and Boyen-Koller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples of clusters of state variables. Empirical results show that our algorithms outperform both ordinary particle filtering and the Boyen-Koller algorithm on large systems. △ Less

Submitted 12 December, 2012; originally announced January 2013.

Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

Report number: UAI-P-2002-PG-370-377

arXiv:1204.1611 [pdf]

Vision-based Human Gender Recognition: A Survey

Authors: Choon Boon Ng, Yong Haur Tay, Bok Min Goi

Abstract: Gender is an important demographic attribute of people. This paper provides a survey of human gender recognition in computer vision. A review of approaches exploiting information from face and whole body (either from a still image or gait sequence) is presented. We highlight the challenges faced and survey the representative methods of these approaches. Based on the results, good performance have… ▽ More Gender is an important demographic attribute of people. This paper provides a survey of human gender recognition in computer vision. A review of approaches exploiting information from face and whole body (either from a still image or gait sequence) is presented. We highlight the challenges faced and survey the representative methods of these approaches. Based on the results, good performance have been achieved for datasets captured under controlled environments, but there is still much work that can be done to improve the robustness of gender recognition under real-life environments. △ Less

Submitted 7 April, 2012; originally announced April 2012.

Comments: 30 pages

arXiv:1105.1562 [pdf, ps, other]

doi 10.1109/GLOCOM.2009.5426134

A New Class of MDS Erasure Codes Based on Graphs

Authors: Nattakan Puttarak, Phisan Kaewprapha, Boon Chong Ng, **g, Li

Abstract: Maximum distance separable (MDS) array codes are XOR-based optimal erasure codes that are particularly suitable for use in disk arrays. This paper develops an innovative method to build MDS array codes from an elegant class of nested graphs, termed \textit{complete-graph-of-rings (CGR)}. We discuss a systematic and concrete way to transfer these graphs to array codes, unveil an interesting relatio… ▽ More Maximum distance separable (MDS) array codes are XOR-based optimal erasure codes that are particularly suitable for use in disk arrays. This paper develops an innovative method to build MDS array codes from an elegant class of nested graphs, termed \textit{complete-graph-of-rings (CGR)}. We discuss a systematic and concrete way to transfer these graphs to array codes, unveil an interesting relation between the proposed map and the renowned perfect 1-factorization, and show that the proposed CGR codes subsume B-codes as their "contracted" codes. These new codes, termed \textit{CGR codes}, and their dual codes are simple to describe, and require minimal encoding and decoding complexity. △ Less

Submitted 8 May, 2011; originally announced May 2011.

Comments: in Proceeding of IEEE Global Communications Conference (GLOBECOM)

Showing 1–46 of 46 results for author: Ng, B