-
3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection
Authors:
Thye Shan Ng,
Feiqi Cao,
Soyeon Caren Han
Abstract:
Esports has rapidly emerged as a global phenomenon with an ever-expanding audience via platforms, like YouTube. Due to the inherent complexity nature of the game, it is challenging for newcomers to comprehend what the event entails. The chaotic nature of online chat, the fast-paced speech of the game commentator, and the game-specific user interface further compound the difficulty for users in com…
▽ More
Esports has rapidly emerged as a global phenomenon with an ever-expanding audience via platforms, like YouTube. Due to the inherent complexity nature of the game, it is challenging for newcomers to comprehend what the event entails. The chaotic nature of online chat, the fast-paced speech of the game commentator, and the game-specific user interface further compound the difficulty for users in comprehending the gameplay. To overcome these challenges, it is crucial to integrate the Multi-Modal (MM) information from the platform and understand the event. The paper introduces a new MM multi-teacher-based game event detection framework, with the ultimate goal of constructing a comprehensive framework that enhances the comprehension of the ongoing game situation. While conventional MM models typically prioritise aligning MM data through concurrent training towards a unified objective, our framework leverages multiple teachers trained independently on different tasks to accomplish the Game Event Detection. The experiment clearly shows the effectiveness of the proposed MM multi-teacher framework.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Battling Botpoop using GenAI for Higher Education: A Study of a Retrieval Augmented Generation Chatbots Impact on Learning
Authors:
Maung Thway,
Jose Recatala-Gomez,
Fun Siong Lim,
Kedar Hippalgaonkar,
Leonard W. T. Ng
Abstract:
Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while redu…
▽ More
Generative artificial intelligence (GenAI) and large language models (LLMs) have simultaneously opened new avenues for enhancing human learning and increased the prevalence of poor-quality information in student response - termed Botpoop. This study introduces Professor Leodar, a custom-built, Singlish-speaking Retrieval Augmented Generation (RAG) chatbot designed to enhance educational while reducing Botpoop. Deployed at Nanyang Technological University, Singapore, Professor Leodar offers a glimpse into the future of AI-assisted learning, offering personalized guidance, 24/7 availability, and contextually relevant information. Through a mixed-methods approach, we examine the impact of Professor Leodar on learning, engagement, and exam preparedness, with 97.1% of participants reporting positive experiences. These findings help define possible roles of AI in education and highlight the potential of custom GenAI chatbots. Our combination of chatbot development, in-class deployment and outcomes study offers a benchmark for GenAI educational tools and is a step** stone for redefining the interplay between AI and human learning.
△ Less
Submitted 21 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Claw-free minimal matching covered graphs
Authors:
Yipei Zhang,
Xiumei Wang,
**jiang Yuan,
C. T. Ng,
T. C. E. Cheng
Abstract:
A matching covered graph $G$ is minimal if for each edge $e$ of $G$, $G-e$ is not matching covered. An edge $e$ of a matching covered graph $G$ is removable if $G-e$ is also matching covered. Thus a matching covered graph is minimal if and only if it is free of removable edges. For bipartite graphs, Lovász and Plummer gave a characterization of bipartite minimal matching covered graphs. For bricks…
▽ More
A matching covered graph $G$ is minimal if for each edge $e$ of $G$, $G-e$ is not matching covered. An edge $e$ of a matching covered graph $G$ is removable if $G-e$ is also matching covered. Thus a matching covered graph is minimal if and only if it is free of removable edges. For bipartite graphs, Lovász and Plummer gave a characterization of bipartite minimal matching covered graphs. For bricks, Lovász showed that the only bricks that are minimal matching covered are $K_4$ and $\overline{C_6}$. In this paper, we present a complete characterization of minimal matching covered graphs that are claw-free. Moreover, for cubic claw-free matching covered graphs that are not minimal matching covered, we obtain the number of their removable edges (with respect to their bricks), and then prove that they have at least 12 removable edges (the bound is sharp).
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO
Authors:
Man Tik Ng,
Hui Tung Tse,
Jen-tse Huang,
**g**g Li,
Wenxuan Wang,
Michael R. Lyu
Abstract:
The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap,…
▽ More
The role-play ability of Large Language Models (LLMs) has emerged as a popular research direction. However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals. Such an oversight limits the potential for advancements in digital human clones and non-player characters in video games. To bridge this gap, we introduce ECHO, an evaluative framework inspired by the Turing test. This framework engages the acquaintances of the target individuals to distinguish between human and machine-generated responses. Notably, our framework focuses on emulating average individuals rather than historical or fictional figures, presenting a unique advantage to apply the Turing Test. We evaluated three role-playing LLMs using ECHO, with GPT-3.5 and GPT-4 serving as foundational models, alongside the online application GPTs from OpenAI. Our results demonstrate that GPT-4 more effectively deceives human evaluators, and GPTs achieves a leading success rate of 48.3%. Furthermore, we investigated whether LLMs could discern between human-generated and machine-generated texts. While GPT-4 can identify differences, it could not determine which texts were human-produced. Our code and results of reproducing the role-playing LLMs are made publicly available via https://github.com/CUHK-ARISE/ECHO.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Semi-on-Demand Hybrid Transit Route Design with Shared Autonomous Mobility Services
Authors:
Max T. M. Ng,
Florian Dandl,
Hani S. Mahmassani,
Klaus Bogenberger
Abstract:
This study examines the route design of a semi-on-demand hybrid route directional service in the public transit network, offering on-demand flexible route service in low-density areas and fixed route service in higher-density areas with Shared Autonomous Mobility Service (SAMS). The study develops analytically tractable cost expressions that capture access, waiting, and riding costs for users, and…
▽ More
This study examines the route design of a semi-on-demand hybrid route directional service in the public transit network, offering on-demand flexible route service in low-density areas and fixed route service in higher-density areas with Shared Autonomous Mobility Service (SAMS). The study develops analytically tractable cost expressions that capture access, waiting, and riding costs for users, and distance-based operating and time-based vehicle costs for operators. Two formulations are presented for strategic and tactical decisions in flexible route portion, fleet size, headway, and vehicle size optimization, enabling the determination of route types between fixed, hybrid, and flexible routes based on demand, cost, and operational parameters. The practical applications and benefits of semi-on-demand feeders are demonstrated with numerical examples and a large-scale case study in the Chicago metropolitan area. Findings reveal scenarios in which flexible route portions serving passengers located further away reduce total costs, particularly user costs. Lower operating costs in lower-demand areas favor more flexible routes, whereas higher demand densities favor more traditional line-based operations. On two studied lines, a current cost forecast favors smaller vehicles with flexible routes, but operating constraints and higher operating costs would favor bigger vehicles with hybrid routes. The study provides an analytical tool to design SAMS as directional services and transit feeders, and tractable continuous approximation formulations for future research in transit network design.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics
Authors:
Lekai Song,
Pengyu Liu,
**gfang Pei,
Yang Liu,
Songwei Liu,
Shengbo Wang,
Leonard W. T. Ng,
Tawfique Hasan,
Kong-Pang Pun,
Shuo Gao,
Guohua Hu
Abstract:
The demand for efficient edge vision has spurred the interest in develo** stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech…
▽ More
The demand for efficient edge vision has spurred the interest in develo** stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing technique, facilitated with memristor-enabled stochastic logics. Specifically, we integrate the memristors with logic circuits and harness the stochasticity from the memristors to realize compact stochastic logics for stochastic number encoding and processing. The stochastic numbers, exhibiting well-regulated probabilities and correlations, can be processed to perform logic operations with statistical probabilities. This can facilitate lightweight stochastic edge detection for edge visual scenarios characterized with high-level noise errors. As a practical demonstration, we implement a hardware stochastic Roberts cross operator using the stochastic logics, and prove its exceptional edge detection performance, remarkably, with 95% less computational cost while withstanding 50% bit-flip errors. The results underscore the great potential of our stochastic edge detection approach in develo** lightweight, error-tolerant edge vision hardware and systems for autonomous driving, virtual/augmented reality, medical imaging diagnosis, industrial automation, and beyond.
△ Less
Submitted 20 March, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Syllable based DNN-HMM Cantonese Speech to Text System
Authors:
Timothy Wong,
Claire Li,
Sam Lam,
Billy Chiu,
Qin Lu,
Minglei Li,
Dan Xiong,
Roy Shing Yu,
Vincent T. Y. Ng
Abstract:
This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventi…
▽ More
This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventional Initial-Final (IF) syllables, or the Onset-Nucleus-Coda (ONC) syllables where finals are further split into nucleus and coda to reflect the intra-syllable variations in Cantonese. By using the Kaldi toolkit, our system is trained using the stochastic gradient descent optimization model with the aid of GPUs for the hybrid Deep Neural Network and Hidden Markov Model (DNN-HMM) with and without I-vector based speaker adaptive training technique. The input features of the same Gaussian Mixture Model with speaker adaptive training (GMM-SAT) to DNN are used in all cases. Experiments show that the ONC-based syllable acoustic modeling with I-vector based DNN-HMM achieves the best performance with the word error rate (WER) of 9.66% and the real time factor (RTF) of 1.38812.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Eigenmode Decomposition Method for Full-Wave Modeling of Microring Resonators
Authors:
Yuriy Akimov,
Aswin Alexander Eapen,
Shiyang Zhu,
Doris K. T. Ng,
Nanxi Li,
Woon Leng Loh,
Lennon Y. T. Lee,
Alagappan Gandhi,
Aravind P. Anthur
Abstract:
We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description o…
▽ More
We develop a theoretical predictive model for an all-pass ring resonator that enables the most complete description of linear coupling regimes. The model is based on eigenmode decomposition of Maxwell's equations with full account of the confined and leaky modes, as opposed to the existing phenomenological methods restricted to the confined modes only. This model enables quantitative description of all-pass ring resonators and provides insights into the physics underlying microring-waveguide coupling. We experimentally validate the model using transmission measurements in the linear regime of aluminium nitride resonators. The developed model is then used to explore the field enhancement in microrings crucial for nonlinear photonic applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
Authors:
Mingbin Xu,
Alex **,
Sicheng Wang,
Mu Su,
Tim Ng,
Henry Mason,
Shiyi Han,
Zhihong Lei,
Yaqiao Deng,
Zhen Huang,
Mahesh Krishnamoorthy
Abstract:
With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices.…
▽ More
With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on smart wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.
△ Less
Submitted 13 May, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Band alignment of grafted monocrystalline Si (001)/$β$-Ga$_2$O$_3$ (010) p-n heterojunction determined by X-ray photoelectron spectroscopy
Authors:
Jiarui Gong,
Jie Zhou,
Ashok Dheenan,
Moheb Sheikhi,
Fikadu Alema,
Tien Khee Ng,
Shubhra S. Pasayat,
Qiaoqiang Gan,
Andrei Osinsky,
Vincent Gambin,
Chirag Gupta,
Siddharth Rajan,
Boon S. Ooi,
Zhenqiang Ma
Abstract:
Beta-phase gallium oxide ($β$-Ga$_2$O$_3$) research has gained accelerated pace due to its superiorly large bandgap and commercial availability of large-diameter native substrates. However, the high acceptor activation energy obstructs the development of homojunction bipolar devices employing $β$-Ga$_2$O$_3$. The recently demonstrated semiconductor grafting technique provides an alternative and vi…
▽ More
Beta-phase gallium oxide ($β$-Ga$_2$O$_3$) research has gained accelerated pace due to its superiorly large bandgap and commercial availability of large-diameter native substrates. However, the high acceptor activation energy obstructs the development of homojunction bipolar devices employing $β$-Ga$_2$O$_3$. The recently demonstrated semiconductor grafting technique provides an alternative and viable approach towards lattice-mismatched $β$-Ga$_2$O$_3$-based p-n heterojunctions with high quality interfaces. Understanding and quantitatively characterizing the band alignment of the grafted heterojunctions is crucial for future bipolar device development employing the grafting method. In this work, we present a systematic study of the band alignment in the grafted monocrystalline Si/$β$-Ga$_2$O$_3$ heterostructure by employing X-ray photoelectron spectroscopy (XPS). The core level peaks and valence band spectra of the Si, $β$-Ga$_2$O$_3$, and the grafted heterojunction were carefully obtained and analyzed. The band diagrams of the Si/$β$-Ga$_2$O$_3$ heterostructure were constructed using two individual methods, the core level peak method and the valence band spectrum method, by utilizing the different portions of the measured data. The reconstructed band alignments of the Si/$β$-Ga$_2$O$_3$ heterostructure using the two different methods are identical within the error range. The band alignment is also consistent with the prediction from the electron affinity values of Si and $β$-Ga$_2$O$_3$. The study suggests that the interface defect density in grafted Si/$β$-Ga$_2$O$_3$ heterostructure is at a sufficiently low level such that Fermi level pinning at the interface has been completely avoided and the universal electron affinity rule can be safely employed to construct the band diagrams of grafted monocrystalline Si/$β$-Ga$_2$O$_3$ heterostructures.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Observer study-based evaluation of TGAN architecture used to generate oncological PET images
Authors:
Roberto Fedrigo,
Fereshteh Yousefirizi,
Zi** Liu,
Abhinav K. Jha,
Robert V. Bergen,
Jean-Francois Rajotte,
Raymond T. Ng,
Ingrid Bloise,
Sara Harsini,
Dan J. Kadrmas,
Carlos Uribe,
Arman Rahmim
Abstract:
The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal…
▽ More
The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal generative adversarial network (TGAN) architecture and evaluated its performance segmenting lesions and identifying radiomics features in synthesized images. In this work, a two-alternative forced-choice (2AFC) observer study was performed to quantitatively evaluate the ability of human observers to distinguish between real and synthesized oncological PET images. In the study eight trained readers, including two board-certified nuclear medicine physicians, read 170 real/synthetic image pairs presented as 2D-transaxial using a dedicated web app. For each image pair, the observer was asked to identify the real image and input their confidence level with a 5-point Likert scale. P-values were computed using the binomial test and Wilcoxon signed-rank test. A heat map was used to compare the response accuracy distribution for the signed-rank test. Response accuracy for all observers ranged from 36.2% [27.9-44.4] to 63.1% [54.8-71.3]. Six out of eight observers did not identify the real image with statistical significance, indicating that the synthetic dataset was reasonably representative of oncological PET images. Overall, this study adds validity to the realism of our simulated H&N cancer dataset, which may be implemented in the future to train AI algorithms while favoring patient confidentiality and privacy protection.
△ Less
Submitted 27 November, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
A Replica-BCS theory for dirty superconductors
Authors:
Yat Fan Lau,
Tai Kai Ng
Abstract:
Motivated by the discovery of the anomalous metal state in superconductor thin films, we revisit in this paper the problem of dirty superconductors using a replica-symmetric BCS (RS-BCS) theory for dirty metals with net attractive interactions. Within the RS-BCS mean field theory, we show that the (dirty) superconductor transits to a Cooper-pair-glass state beyond a critical strength of disorder.…
▽ More
Motivated by the discovery of the anomalous metal state in superconductor thin films, we revisit in this paper the problem of dirty superconductors using a replica-symmetric BCS (RS-BCS) theory for dirty metals with net attractive interactions. Within the RS-BCS mean field theory, we show that the (dirty) superconductor transits to a Cooper-pair-glass state beyond a critical strength of disorder. The single particle tunneling density of states and the superfluid density are computed within the RS-BCS theory for different strengths of disorder. We find that the single-particle spectral gap is strongly enhanced by disorder and the superfluid density reduces rapidly from the corresponding clean superconducting limit with increasing strength of disorder but remains finite in the Cooper-pair-glass state. The nature of the Cooper-pair-glass state and relevance of our result to the anomalous metal state are briefly discussed.
△ Less
Submitted 22 April, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Mean Field Analysis of Two-Party Governance: Competition versus Cooperation among Leaders
Authors:
Dantong Chu,
Kenneth Tsz Hin Ng,
Sheung Chi Phillip Yam,
Harry Zheng
Abstract:
This article studies linear-quadratic Stackelberg games between two dominating players (or equivalently, leaders) and a large group of followers, each of whom interacts under a mean field game (MFG) framework. Unlike the conventional major-minor player game, the mean field term herein is endogenously affected by the two leaders simultaneously. These homogeneous followers are non-cooperative, where…
▽ More
This article studies linear-quadratic Stackelberg games between two dominating players (or equivalently, leaders) and a large group of followers, each of whom interacts under a mean field game (MFG) framework. Unlike the conventional major-minor player game, the mean field term herein is endogenously affected by the two leaders simultaneously. These homogeneous followers are non-cooperative, whereas the two leaders can either compete or cooperate with each other, which are respectively formulated as a Nash and a Pareto game. The complete solutions of the leader-follower game can be expressed in terms of the solutions of some non-symmetric Riccati equations. Notably, our analysis suggests that both modes of interactions between leaders has their own merits and neither of them is always more favourable to the community of followers. In our knowledge, a comparative study of the effect of different modes of governance on the society is relatively rare in the existing literature, we here provide its first preliminary quantitative analysis; under a broad class of practically relevant models, we provide sufficient conditions to decide whether cooperation or competition between leaders is more favourable to the followers. Being in common with modern folklore, the relative merits of the two Stackelberg games depend on whether the interests between the two leaders and the followers align among themselves. Representative numerical examples are also supplemented.
△ Less
Submitted 1 July, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
Towards Robust Temporal Reasoning of Large Language Models via a Multi-Hop QA Dataset and Pseudo-Instruction Tuning
Authors:
Qingyu Tan,
Hwee Tou Ng,
Lidong Bing
Abstract:
Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answeri…
▽ More
Knowledge in the real world is being updated constantly. However, it is costly to frequently update large language models (LLMs). Therefore, it is crucial for LLMs to understand the concept of temporal knowledge. However, prior works on temporal question answering did not emphasize multi-answer and multi-hop types of temporal reasoning. In this paper, we propose a complex temporal question-answering (QA) dataset Complex-TR that focuses on multi-answer and multi-hop temporal reasoning. Besides, we also propose a novel data augmentation strategy to improve the complex temporal reasoning capability and robustness of LLMs. We conducted experiments on multiple temporal QA datasets. Experimental results show that our method is able to improve LLMs' performance on temporal QA benchmarks by significant margins.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
On the Robustness of Question Rewriting Systems to Questions of Varying Hardness
Authors:
Hai Ye,
Hwee Tou Ng,
Wenjuan Han
Abstract:
In conversational question answering (CQA), the task of question rewriting~(QR) in context aims to rewrite a context-dependent question into an equivalent self-contained question that gives the same answer. In this paper, we are interested in the robustness of a QR system to questions varying in rewriting hardness or difficulty. Since there is a lack of questions classified based on their rewritin…
▽ More
In conversational question answering (CQA), the task of question rewriting~(QR) in context aims to rewrite a context-dependent question into an equivalent self-contained question that gives the same answer. In this paper, we are interested in the robustness of a QR system to questions varying in rewriting hardness or difficulty. Since there is a lack of questions classified based on their rewriting hardness, we first propose a heuristic method to automatically classify questions into subsets of varying hardness, by measuring the discrepancy between a question and its rewrite. To find out what makes questions hard or easy for rewriting, we then conduct a human evaluation to annotate the rewriting hardness of questions. Finally, to enhance the robustness of QR systems to questions of varying hardness, we propose a novel learning framework for QR that first trains a QR model independently on each subset of questions of a certain level of hardness, then combines these QR models as one joint model for inference. Experimental results on two datasets show that our framework improves the overall performance compared to the baselines.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Local Statistics for Generative Image Detection
Authors:
Yung Jer Wong,
Teck Khim Ng
Abstract:
Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvement in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesi…
▽ More
Diffusion models (DMs) are generative models that learn to synthesize images from Gaussian noise. DMs can be trained to do a variety of tasks such as image generation and image super-resolution. Researchers have made significant improvement in the capability of synthesizing photorealistic images in the past few years. These successes also hasten the need to address the potential misuse of synthesized images. In this paper, we highlight the effectiveness of computing local statistics, as opposed to global statistics, in distinguishing digital camera images from DM-generated images. We hypothesized that local statistics should be used to address the spatial non-stationarity problem in images. We show that our approach produced promising results and it is also robust to various perturbations such as image resizing and JPEG compression.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
System Combination via Quality Estimation for Grammatical Error Correction
Authors:
Muhammad Reza Qorib,
Hwee Tou Ng
Abstract:
Quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that…
▽ More
Quality estimation models have been developed to assess the corrections made by grammatical error correction (GEC) models when the reference or gold-standard corrections are not available. An ideal quality estimator can be utilized to combine the outputs of multiple GEC systems by choosing the best subset of edits from the union of all edits proposed by the GEC base systems. However, we found that existing GEC quality estimation models are not good enough in differentiating good corrections from bad ones, resulting in a low F0.5 score when used for system combination. In this paper, we propose GRECO, a new state-of-the-art quality estimation model that gives a better estimate of the quality of a corrected sentence, as indicated by having a higher correlation to the F0.5 score of a corrected sentence. It results in a combined GEC system with a higher F0.5 score. We also propose three methods for utilizing GEC quality estimation models for system combination with varying generality: model-agnostic, model-agnostic with voting bias, and model-dependent method. The combined GEC system outperforms the state of the art on the CoNLL-2014 test set and the BEA-2019 test set, achieving the highest F0.5 scores published to date.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Towards Real-World Streaming Speech Translation for Code-Switched Speech
Authors:
Belen Alastruey,
Matthias Sperber,
Christian Gollan,
Dominic Telaar,
Tim Ng,
Aashish Agarwal
Abstract:
Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\t…
▽ More
Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}).
In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.
△ Less
Submitted 23 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization
Authors:
Zhihong Lei,
Ernest Pusateri,
Shiyi Han,
Leo Liu,
Mingbin Xu,
Tim Ng,
Ruchir Travadi,
Youyuan Zhang,
Mirko Hannemann,
Man-Hung Siu,
Zhen Huang
Abstract:
Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a…
▽ More
Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a novel method for generating additional subword tokenizations for personal entities from their pronunciations. We show that using this technique in combination with two established techniques, contextual biasing and wordpiece prior normalization, we are able to achieve personal named entity accuracy on par with a competitive hybrid system.
△ Less
Submitted 15 October, 2023;
originally announced October 2023.
-
Acoustic Model Fusion for End-to-end Speech Recognition
Authors:
Zhihong Lei,
Mingbin Xu,
Shiyi Han,
Leo Liu,
Zhen Huang,
Tim Ng,
Yuanyuan Zhang,
Ernest Pusateri,
Mirko Hannemann,
Yaqiao Deng,
Man-Hung Siu
Abstract:
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, tr…
▽ More
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Properly colored even cycles in edge-colored complete balanced bipartite graphs
Authors:
Shanshan Guo,
Fei Huang,
**jiang Yuan,
C. T. Ng,
T. C. E. Cheng
Abstract:
Consider a complete balanced bipartite graph $K_{n,n}$ and let $K^c_{n,n}$ be an edge-colored version of $K_{n,n}$ that is obtained from $K_{n,n}$ by having each edge assigned a certain color. A subgraph $H$ of $K^c_{n,n}$ is called properly colored (PC) if every two adjacent edges of $H$ have distinct colors. $K_{n,n}^c$ is called properly vertex-even-pancyclic if for every vertex…
▽ More
Consider a complete balanced bipartite graph $K_{n,n}$ and let $K^c_{n,n}$ be an edge-colored version of $K_{n,n}$ that is obtained from $K_{n,n}$ by having each edge assigned a certain color. A subgraph $H$ of $K^c_{n,n}$ is called properly colored (PC) if every two adjacent edges of $H$ have distinct colors. $K_{n,n}^c$ is called properly vertex-even-pancyclic if for every vertex $u\in V(K_{n,n}^c)$ and for every even integer $k$ with $4 \leq k \leq 2n$, there exists a PC $k$-cycle containing $u$. The minimum color degree $δ^c(K^c_{n,n})$ of $K^c_{n,n}$ is the largest integer $k$ such that for every vertex $v$, there are at least $k$ distinct colors on the edges incident to $v$. In this paper we study the existence of PC even cycles in $K_{n,n}^c$. We first show that, for every integer $t\geq 3$, every $K^c_{n,n}$ with $δ^c(K^c_{n,n})\geq \frac{2n}{3}+t$ contains a PC 2-factor $H$ such that every cycle of $H$ has a length of at least $t$. By using the probabilistic method and absorbing technique, we use the above result to further show that, for every $\varepsilon>0$, there exists an integer $n_0(\varepsilon)$ such that every $K^c_{n,n}$ with $n\geq n_0(\varepsilon)$ is properly vertex-even-pancyclic, provided that $δ^c(K^c_{n,n})\geq (\frac{2}{3}+\varepsilon)n$.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Demonstration of a monocrystalline GaAs-$β$-Ga$_2$O$_3$ p-n heterojunction
Authors:
Jie Zhou,
Moheb Sheikhi,
Ashok Dheenan,
Haris Abbasi,
Jiarui Gong,
Yang Liu,
Carolina Adamo,
Patrick Marshall,
Nathan Wriedt,
Clincy Cheung,
Shuoyang Qiu,
Tien Khee Ng,
Qiaoqiang Gan,
Vincent Gambin,
Boon S. Ooi,
Siddharth Rajan,
Zhenqiang Ma
Abstract:
In this work, we report the fabrication and characterizations of a monocrystalline GaAs/$β$-Ga$_2$O$_3$ p-n heterojunction by employing semiconductor grafting technology. The heterojunction was created by lifting off and transfer printing a p-type GaAs single crystal nanomembrane to an Al$_2$O$_3$-coated n-type$β$-Ga$_2$O$_3$ epitaxial substrate. The resultant heterojunction diodes exhibit remarka…
▽ More
In this work, we report the fabrication and characterizations of a monocrystalline GaAs/$β$-Ga$_2$O$_3$ p-n heterojunction by employing semiconductor grafting technology. The heterojunction was created by lifting off and transfer printing a p-type GaAs single crystal nanomembrane to an Al$_2$O$_3$-coated n-type$β$-Ga$_2$O$_3$ epitaxial substrate. The resultant heterojunction diodes exhibit remarkable performance metrics, including an ideality factor of 1.23, a high rectification ratio of 8.04E9 at +/- 4V, and a turn on voltage of 2.35 V. Furthermore, at +5 V, the diode displays a large current density of 2500 A/cm$^2$ along with a low ON resistance of 2 m$Ω\cdot$cm$^2$.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Zen: Near-Optimal Sparse Tensor Synchronization for Distributed DNN Training
Authors:
Zhuang Wang,
Zhaozhuo Xu,
Anshumali Shrivastava,
T. S. Eugene Ng
Abstract:
Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training eff…
▽ More
Distributed training is the de facto standard to scale up the training of Deep Neural Networks (DNNs) with multiple GPUs. The performance bottleneck of distributed training lies in communications for gradient synchronization. Recently, practitioners have observed sparsity in gradient tensors, suggesting the potential to reduce the traffic volume in communication and improve end-to-end training efficiency. Yet, the optimal communication scheme to fully leverage sparsity is still missing. This paper aims to address this gap. We first analyze the characteristics of sparse tensors in popular DNN models to understand the fundamentals of sparsity. We then systematically explore the design space of communication schemes for sparse tensors and find the optimal one. % We then find the optimal scheme based on the characteristics by systematically exploring the design space. We also develop a gradient synchronization system called Zen that approximately realizes it for sparse tensors. We demonstrate that Zen can achieve up to 5.09x speedup in communication time and up to 2.48x speedup in training throughput compared to the state-of-the-art methods.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
Sufficient conditions for $k$-factors and spanning trees of graphs
Authors:
Guoyan Ao,
Ruifang Liu,
**jiang Yuan,
C. T. Ng,
T. C. E. Cheng
Abstract:
For any integer $k\geq1,$ a graph $G$ has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper we prove a sufficient condition in terms of the number of $r$-cliques to guarantee the existence of a $k$-factor in a graph with minimum degree at least $δ$, which improves the sufficient condition of O \cite{O2021} based on the number of edges. For any integer $k\geq2,$ a spanning…
▽ More
For any integer $k\geq1,$ a graph $G$ has a $k$-factor if it contains a $k$-regular spanning subgraph. In this paper we prove a sufficient condition in terms of the number of $r$-cliques to guarantee the existence of a $k$-factor in a graph with minimum degree at least $δ$, which improves the sufficient condition of O \cite{O2021} based on the number of edges. For any integer $k\geq2,$ a spanning $k$-tree of a connected graph $G$ is a spanning tree in which every vertex has degree at most $k$. Motivated by the technique of Li and Ning \cite{Li2016}, we present a tight spectral condition for an $m$-connected graph to have a spanning $k$-tree, which extends the result of Fan, Goryainov, Huang and Lin \cite{Fan2021} from $m=1$ to general $m$. Let $T$ be a spanning tree of a connected graph. The leaf degree of $T$ is the maximum number of leaves adjacent to $v$ in $T$ for any $v\in V(T)$. We provide a tight spectral condition for the existence of a spanning tree with leaf degree at most $k$ in a connected graph with minimum degree $δ$, where $k\geq1$ is an integer.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Snapp: An Agile Robotic Fish with 3-D Maneuverability for Open Water Swim
Authors:
Timothy J. K. Ng,
Nan Chen,
Fu Zhang
Abstract:
Fish exhibit impressive locomotive performance and agility in complex underwater environments, using their undulating tails and pectoral fins for propulsion and maneuverability. Replicating these abilities in robotic fish is challenging; existing designs focus on either fast swimming or directional control at limited speeds, mainly within a confined environment. To address these limitations, we de…
▽ More
Fish exhibit impressive locomotive performance and agility in complex underwater environments, using their undulating tails and pectoral fins for propulsion and maneuverability. Replicating these abilities in robotic fish is challenging; existing designs focus on either fast swimming or directional control at limited speeds, mainly within a confined environment. To address these limitations, we designed Snapp, an integrated robotic fish capable of swimming in open water with high speeds and full 3-dimensional maneuverability. A novel cyclic-differential method is layered on the mechanism. It integrates propulsion and yaw-steering for fast course corrections. Two independent pectoral fins provide pitch and roll control. We evaluated Snapp in open water environments. We demonstrated significant improvements in speed and maneuverability, achieving swimming speeds of 1.5 m/s (1.7 Body Lengths per second) and performing complex maneuvers, such as a figure-8 and S-shape trajectory. Instantaneous yaw changes of 15$^{\circ}$ in 0.4 s, a minimum turn radius of 0.85 m, and maximum pitch and roll rates of 3.5 rad/s and 1 rad/s, respectively, were recorded. Our results suggest that Snapp's swimming capabilities have excellent practical prospects for open seas and contribute significantly to develo** agile robotic fishes.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Initial demonstration of AlGaAs-GaAsP-beta-Ga2O3 n-p-n double heterojunctions
Authors:
Jie Zhou,
Ashok Dheenan,
Jiarui Gong,
Carolina Adamo,
Patrick Marshall,
Moheb Sheikhi,
Tsung-Han Tsai,
Nathan Wriedt,
Clincy Cheung,
Shuoyang Qiu,
Tien Khee Ng,
Qiaoqiang Gan,
Gambin Vincent,
Boon S. Ooi,
Siddharth Rajan,
Zhenqiang Ma
Abstract:
Beta phase gallium oxides, an ultrawide-bandgap semiconductor, has great potential for future power and RF electronics applications but faces challenges in bipolar device applications due to the lack of p-type dopants. In this work, we demonstrate monocrystalline AlGaAs_GaAsP_beta phase gallium oxides n-p-n double-heterojunctions, synthesized using semiconductor grafting technology. By transfer pr…
▽ More
Beta phase gallium oxides, an ultrawide-bandgap semiconductor, has great potential for future power and RF electronics applications but faces challenges in bipolar device applications due to the lack of p-type dopants. In this work, we demonstrate monocrystalline AlGaAs_GaAsP_beta phase gallium oxides n-p-n double-heterojunctions, synthesized using semiconductor grafting technology. By transfer printing an n-AlGaAs_p-GaAsP nanomembrane to the n-beta phase-Ga$_2$O$_3$ epitaxial substrate, we simultaneously achieved AlGaAs_GaAsP epitaxial n-p junction diode with an ideality factor of 1.29 and a rectification ratio of 2.57E3 at +/- 2 V, and grafted GaAsP_beta_phase_gallium oxides p-n junction diode exhibiting an ideality factor of 1.36 and a rectification ratio of 4.85E2 at +/- 2 V.
△ Less
Submitted 14 August, 2023; v1 submitted 12 August, 2023;
originally announced August 2023.
-
Redesigning Large-Scale Multimodal Transit Networks with Shared Autonomous Mobility Services
Authors:
Max T. M. Ng,
Hani S. Mahmassani,
Ömer Verbas,
Taner Cokyasar,
Roman Engelhardt
Abstract:
This study addresses a large-scale multimodal transit network design problem, with Shared Autonomous Mobility Services (SAMS) as both transit feeders and an origin-to-destination mode. The framework captures spatial demand and modal characteristics, considers intermodal transfers and express services, determines transit infrastructure investment and path flows, and generates transit routes. A syst…
▽ More
This study addresses a large-scale multimodal transit network design problem, with Shared Autonomous Mobility Services (SAMS) as both transit feeders and an origin-to-destination mode. The framework captures spatial demand and modal characteristics, considers intermodal transfers and express services, determines transit infrastructure investment and path flows, and generates transit routes. A system-optimal multimodal transit network is designed with minimum total door-to-door generalized costs of users and operators, satisfying transit origin-destination demand within a pre-set infrastructure budget. Firstly, the geography, demand, and modes in each zone are characterized with continuous approximation. The decisions of network link investment and multimodal path flows in zonal connection optimization are formulated as a minimum-cost multi-commodity network flow (MCNF) problem and solved efficiently with a mixed-integer linear programming (MILP) solver. Subsequently, the route generation problem is solved by expanding the MCNF formulation to minimize intramodal transfers. The model is illustrated through a set of experiments with the Chicago network comprised of 50 zones and seven modes, under three scenarios. The computational results present savings in traveler journey time and operator cost demonstrating the potential benefits of collaboration between multimodal transit systems and SAMS.
△ Less
Submitted 27 March, 2024; v1 submitted 29 July, 2023;
originally announced July 2023.
-
Modile as a conservative tail risk measurer: the solution of an optimisation problem with 0-1 loss function
Authors:
Keming Yu,
Rong Jiang,
Chi Tim Ng
Abstract:
Quantiles and expectiles, which are two important concepts and tools in tail risk measurements, can be regarded as an extension of median and mean, respectively. Both of these tail risk measurers can actually be embedded in a common framework of $L_p$ optimization with the absolute loss function ($p=1$) and quadratic loss function ($p=2$), respectively. When 0-1 loss function is frequently used in…
▽ More
Quantiles and expectiles, which are two important concepts and tools in tail risk measurements, can be regarded as an extension of median and mean, respectively. Both of these tail risk measurers can actually be embedded in a common framework of $L_p$ optimization with the absolute loss function ($p=1$) and quadratic loss function ($p=2$), respectively. When 0-1 loss function is frequently used in statistics, machine learning and decision theory, this paper introduces an 0-1 loss function based $L_0$ optimisation problem for tail risk measure and names its solution as modile, which can be regarded as an extension of mode. Mode, as another measure of central tendency, is more robust than expectiles with outliers and easy to compute than quantiles. However, mode based extension for tail risk measure is new. This paper shows that the proposed modiles are not only more conservative than quantiles and expectiles for skewed and heavy-tailed distributions, but also providing or including the unique interpretation of these measures. Further, the modiles can be regarded as a type of generalized quantiles and doubly truncated tail measure whcih have recently attracted a lot of attention in the literature. The asymptotic properties of the corresponding sample-based estimators of modiles are provided, which, together with numerical analysis results, show that the proposed modiles are promising for tail measurement.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data
Authors:
Qingyu Tan,
Lu Xu,
Lidong Bing,
Hwee Tou Ng
Abstract:
Relation extraction (RE) aims to extract relations from sentences and documents. Existing relation extraction models typically rely on supervised machine learning. However, recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'. Models trained with such data inevitably make…
▽ More
Relation extraction (RE) aims to extract relations from sentences and documents. Existing relation extraction models typically rely on supervised machine learning. However, recent studies showed that many RE datasets are incompletely annotated. This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'. Models trained with such data inevitably make similar mistakes during the inference stage. Self-training has been proven effective in alleviating the false negative problem. However, traditional self-training is vulnerable to confirmation bias and exhibits poor performance in minority classes. To overcome this limitation, we proposed a novel class-adaptive re-sampling self-training framework. Specifically, we re-sampled the pseudo-labels for each class by precision and recall scores. Our re-sampling strategy favored the pseudo-labels of classes with high precision and low recall, which improved the overall recall without significantly compromising precision. We conducted experiments on document-level and biomedical relation extraction datasets, and the results showed that our proposed self-training framework consistently outperforms existing competitive methods on the Re-DocRED and ChemDisgene datasets when the training data are incompletely annotated. Our code is released at https://github.com/DAMO-NLP-SG/CAST.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models
Authors:
Qingyu Tan,
Hwee Tou Ng,
Lidong Bing
Abstract:
Reasoning about time is of fundamental importance. Many facts are time-dependent. For example, athletes change teams from time to time, and different government officials are elected periodically. Previous time-dependent question answering (QA) datasets tend to be biased in either their coverage of time spans or question types. In this paper, we introduce a comprehensive probing dataset \tempreaso…
▽ More
Reasoning about time is of fundamental importance. Many facts are time-dependent. For example, athletes change teams from time to time, and different government officials are elected periodically. Previous time-dependent question answering (QA) datasets tend to be biased in either their coverage of time spans or question types. In this paper, we introduce a comprehensive probing dataset \tempreason to evaluate the temporal reasoning capability of large language models. Our dataset includes questions of three temporal reasoning levels. In addition, we also propose a novel learning framework to improve the temporal reasoning capability of large language models, based on temporal span extraction and time-sensitive reinforcement learning. We conducted experiments in closed book QA, open book QA, and reasoning QA settings and demonstrated the effectiveness of our approach. Our code and data are released on https://github.com/DAMO-NLP-SG/TempReason.
△ Less
Submitted 27 June, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering
Authors:
Hai Ye,
Qizhe Xie,
Hwee Tou Ng
Abstract:
In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-ar…
▽ More
In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work. Experiments on six datasets of extractive question answering (QA) show that the dueling framework using Co-UCB is more effective than other strong baselines for our studied problem.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Monocrystalline Si/$β$-Ga$_2$O$_3$ p-n heterojunction diodes fabricated via grafting
Authors:
Jiarui Gong,
Donghyeok Kim,
Hokyung Jang,
Fikadu Alema,
Qingxiao Wang,
Tien Khee Ng,
Shuoyang Qiu,
Jie Zhou,
Xin Su,
Qinchen Lin,
Ranveer Singh,
Haris Abbasi,
Kelson Chabak,
Gregg Jessen,
Clincy Cheung,
Vincent Gambin,
Shubhra S. Pasayat,
Andrei Osinsky,
Boon,
S. Ooi,
Chirag Gupta,
Zhenqiang Ma
Abstract:
The $β$-Ga$_2$O$_3$ has exceptional electronic properties with vast potential in power and RF electronics. Despite the excellent demonstrations of high-performance unipolar devices, the lack of p-type do** in $β$-Ga$_2$O$_3$ has hindered the development of Ga$_2$O$_3$-based bipolar devices. The approach of p-n diodes formed by polycrystalline p-type oxides with n-type $β$-Ga$_2$O$_3$ can face se…
▽ More
The $β$-Ga$_2$O$_3$ has exceptional electronic properties with vast potential in power and RF electronics. Despite the excellent demonstrations of high-performance unipolar devices, the lack of p-type do** in $β$-Ga$_2$O$_3$ has hindered the development of Ga$_2$O$_3$-based bipolar devices. The approach of p-n diodes formed by polycrystalline p-type oxides with n-type $β$-Ga$_2$O$_3$ can face severe challenges in further advancing the $β$-Ga$_2$O$_3$ bipolar devices due to their unfavorable band alignment and the poor p-type oxide crystal quality. In this work, we applied the semiconductor grafting approach to fabricate monocrystalline Si/$β$-Ga$_2$O$_3$ p-n diodes for the first time. With enhanced concentration of oxygen atoms at the interface of Si/$β$-Ga$_2$O$_3$, double side surface passivation was achieved for both Si and $β$-Ga$_2$O$_3$ with an interface Dit value of 1-3 x 1012 /cm2 eV. A Si/$β$-Ga$_2$O$_3$ p-n diode array with high fabrication yield was demonstrated along with a diode rectification of 1.3 x 107 at +/- 2 V, a diode ideality factor of 1.13 and avalanche reverse breakdown characteristics. The diodes C-V shows frequency dispersion-free characteristics from 10 kHz to 2 MHz. Our work has set the foundation toward future development of $β$-Ga$_2$O$_3$-based transistors.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Proximity effect and Anomalous metal state in a model of mixed metal-superconductor grains
Authors:
Tai Kai Ng
Abstract:
Motivated by the discovery of the anomalous metal state in thin film systems and suggestions that coexistence of superconducting and metallic components is crucial to the formation of the state, we study in this paper a model of mixed metallic and superconducting grains coupled by electron tunneling - the metallic grains are expected to become superconducting because of proximity effect in a mean-…
▽ More
Motivated by the discovery of the anomalous metal state in thin film systems and suggestions that coexistence of superconducting and metallic components is crucial to the formation of the state, we study in this paper a model of mixed metallic and superconducting grains coupled by electron tunneling - the metallic grains are expected to become superconducting because of proximity effect in a mean-field treatment of the model. When quantum fluctuations in relative phases between different grains are taken into account, we show that the proximity effect can be destroyed and the metallic and superconducting grains become "insulating" with respect to each other when the charging energy between grains are strong enough and tunneling between grains are weak enough, in analogy to superconductor-insulator transition in pure superconducting grains or metal-insulator transition in pure metallic grains. Based on this observation, a physical picture of how the anomalous metal state may form is proposed. An experimental setup to test our proposed physical picture is suggested.
△ Less
Submitted 6 June, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Unlocking Temporal Question Answering for Large Language Models Using Code Execution
Authors:
Xingxuan Li,
Liying Cheng,
Qingyu Tan,
Hwee Tou Ng,
Shafiq Joty,
Lidong Bing
Abstract:
Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challeng…
▽ More
Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challenge that LLMs face when it comes to temporal reasoning. Our preliminary experiments show that generating intermediate reasoning steps does not always boost the performance of complex temporal question-answering tasks. Therefore, we propose a novel framework that combines the extraction capability of LLMs and the logical reasoning capability of a Python solver to tackle this issue. Extensive experiments and analysis demonstrate the effectiveness of our framework in handling intricate time-bound reasoning tasks.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Constraining gravitational wave amplitude birefringence with GWTC-3
Authors:
Thomas C. K. Ng,
Maximiliano Isi,
Kaze W. K. Wong,
Will M. Farr
Abstract:
The propagation of gravitational waves can reveal fundamental features of the structure of spacetime. For instance, differences in the propagation of gravitational-wave polarizations would be a smoking gun for parity violations in the gravitational sector, as expected from birefringent theories like Chern-Simons gravity. Here we look for evidence of amplitude birefringence in the third catalog of…
▽ More
The propagation of gravitational waves can reveal fundamental features of the structure of spacetime. For instance, differences in the propagation of gravitational-wave polarizations would be a smoking gun for parity violations in the gravitational sector, as expected from birefringent theories like Chern-Simons gravity. Here we look for evidence of amplitude birefringence in the third catalog of detections by the Laser Interferometer Gravitational Wave Observatory and Virgo through the use of birefringent templates inspired by dynamical Chern-Simons gravity. From $71$ binary-black-hole signals, we obtain the most precise constraints on gravitational-wave amplitude birefringence yet, measuring a birefringent attenuation of $κ= -0.019^{+0.038}_{-0.029} \, \mathrm{Gpc}^{-1}$ at $100 \, \mathrm{Hz}$ with $90\%$ credibility, equivalent to a parity-violation energy scale of $M_{\rm PV} \gtrsim 6.8 \times 10^{-21}\, {\rm GeV}$.
△ Less
Submitted 30 October, 2023; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Cap** Layer Effects on $Sb_{2}S_{3}$-based Reconfigurable Photonic Devices
Authors:
Ting Yu Teo,
Nanxi Li,
Landobasa Y. M. Tobing,
Amy S. K. Tong,
Doris K. T. Ng,
Zhihao Ren,
Chengkuo Lee,
Lennon Y. T. Lee,
Robert Edward Simpson
Abstract:
Cap** layers are essential for protecting phase change materials (PCMs) used in non-volatile photonics technologies. This work demonstrates how $(ZnS)_{0.8}-(SiO_2)_{0.2}$ caps radically influence the performance of $Sb_{2}S_{3}$ and Ag-doped $Sb_{2}S_{3}$ integrated photonic devices. We found that at least 30 nm of cap** material is necessary to protect the material from Sulfur loss. However,…
▽ More
Cap** layers are essential for protecting phase change materials (PCMs) used in non-volatile photonics technologies. This work demonstrates how $(ZnS)_{0.8}-(SiO_2)_{0.2}$ caps radically influence the performance of $Sb_{2}S_{3}$ and Ag-doped $Sb_{2}S_{3}$ integrated photonic devices. We found that at least 30 nm of cap** material is necessary to protect the material from Sulfur loss. However, adding this cap affects the crystallization temperatures of the two PCMs in different ways. The crystallization temperature of $Sb_{2}S_{3}$ and Ag-doped $Sb_{2}S_{3}$ increased and decreased respectively, which is attributed to interfacial energy differences. Capped and uncapped Ag-doped $Sb_{2}S_{3}$ microring resonator (MRR) devices were fabricated and measured to understand how the cap affects the device performance. Surprisingly, the resonant frequency of the MRR exhibited a larger red-shift upon crystallization for the capped PCMs. This effect was due to the cap increasing the modal overlap with the PCM layer. Caps can, therefore, be used to provide a greater optical phase shift per unit length, thus reducing the overall footprint of these programmable devices. Overall, we conclude that caps on PCMs are not just useful for stabilizing the PCM layer, but can also be used to tune the PCM crystallization temperature and reduce device footprint. Moreover, the cap** layer can be exploited to enhance light-matter interactions with the PCM element.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation
Authors:
Jacky Chen Long Chai,
Tiong-Sik Ng,
Cheng-Yaw Low,
Jaewoo Park,
Andrew Beng ** Teoh
Abstract:
Very low-resolution face recognition (VLRFR) poses unique challenges, such as tiny regions of interest and poor resolution due to extreme standoff distance or wide viewing angle of the acquisition devices. In this paper, we study principled approaches to elevate the recognizability of a face in the embedding space instead of the visual quality. We first formulate a robust learning-based face recog…
▽ More
Very low-resolution face recognition (VLRFR) poses unique challenges, such as tiny regions of interest and poor resolution due to extreme standoff distance or wide viewing angle of the acquisition devices. In this paper, we study principled approaches to elevate the recognizability of a face in the embedding space instead of the visual quality. We first formulate a robust learning-based face recognizability measure, namely recognizability index (RI), based on two criteria: (i) proximity of each face embedding against the unrecognizable faces cluster center and (ii) closeness of each face embedding against its positive and negative class prototypes. We then devise an index diversion loss to push the hard-to-recognize face embedding with low RI away from unrecognizable faces cluster to boost the RI, which reflects better recognizability. Additionally, a perceptibility attention mechanism is introduced to attend to the most recognizable face regions, which offers better explanatory and discriminative traits for embedding learning. Our proposed model is trained end-to-end and simultaneously serves recognizability-aware embedding learning and face quality estimation. To address VLRFR, our extensive evaluations on three challenging low-resolution datasets and face quality assessment demonstrate the superiority of the proposed model over the state-of-the-art methods.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Optimal Investment in Defined Contribution Pension Schemes with Forward Utility Preferences
Authors:
Kenneth Tsz Hin Ng,
Wing Fung Chong
Abstract:
Optimal investment strategies of an individual worker during the accumulation phase in the defined contribution pension scheme have been well studied in the literature. Most of them adopted the classical backward model and approach, but any pre-specifications of retirement time, preferences, and market environment models do not often hold in such a prolonged horizon of the pension scheme. Pre-comm…
▽ More
Optimal investment strategies of an individual worker during the accumulation phase in the defined contribution pension scheme have been well studied in the literature. Most of them adopted the classical backward model and approach, but any pre-specifications of retirement time, preferences, and market environment models do not often hold in such a prolonged horizon of the pension scheme. Pre-commitment to ensure the time-consistency of an optimal investment strategy derived from the backward model and approach leads the supposedly optimal strategy to be sub-optimal in the actual realizations. This paper revisits the optimal investment problem for the worker during the accumulation phase in the defined contribution pension scheme, via the forward preferences, in which an environment-adapting strategy is able to hold optimality and time-consistency together. Stochastic partial differential equation representation for the worker's forward preferences is illustrated. This paper constructs two of the forward utility preferences and solves the corresponding optimal investment strategies, in the cases of initial power and exponential utility functions.
△ Less
Submitted 18 September, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Authors:
Hai Ye,
Yuyang Ding,
Juntao Li,
Hwee Tou Ng
Abstract:
A deployed question answering (QA) model can easily fail when the test data has a distribution shift compared to the training data. Robustness tuning (RT) methods have been widely studied to enhance model robustness against distribution shifts before model deployment. However, can we improve a model after deployment? To answer this question, we evaluate test-time adaptation (TTA) to improve a mode…
▽ More
A deployed question answering (QA) model can easily fail when the test data has a distribution shift compared to the training data. Robustness tuning (RT) methods have been widely studied to enhance model robustness against distribution shifts before model deployment. However, can we improve a model after deployment? To answer this question, we evaluate test-time adaptation (TTA) to improve a model after deployment. We first introduce COLDQA, a unified evaluation benchmark for robust QA against text corruption and changes in language and domain. We then evaluate previous TTA methods on COLDQA and compare them to RT methods. We also propose a novel TTA method called online imitation learning (OIL). Through extensive experiments, we find that TTA is comparable to RT methods, and applying TTA after RT can significantly boost the performance on COLDQA. Our proposed OIL improves TTA to be more robust to variation in hyper-parameters and test distributions over time.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Mixture Modeling with Normalizing Flows for Spherical Density Estimation
Authors:
Tin Lok James Ng,
Andrew Zammit-Mangion
Abstract:
Normalizing flows are objects used for modeling complicated probability density functions, and have attracted considerable interest in recent years. Many flexible families of normalizing flows have been developed. However, the focus to date has largely been on normalizing flows on Euclidean domains; while normalizing flows have been developed for spherical and other non-Euclidean domains, these ar…
▽ More
Normalizing flows are objects used for modeling complicated probability density functions, and have attracted considerable interest in recent years. Many flexible families of normalizing flows have been developed. However, the focus to date has largely been on normalizing flows on Euclidean domains; while normalizing flows have been developed for spherical and other non-Euclidean domains, these are generally less flexible than their Euclidean counterparts. To address this shortcoming, in this work we introduce a mixture-of-normalizing-flows model to construct complicated probability density functions on the sphere. This model provides a flexible alternative to existing parametric, semiparametric, and nonparametric, finite mixture models. Model estimation is performed using the expectation maximization algorithm and a variant thereof. The model is applied to simulated data, where the benefit over the conventional (single component) normalizing flow is verified. The model is then applied to two real-world data sets of events occurring on the surface of Earth; the first relating to earthquakes, and the second to terrorist activity. In both cases, we see that the mixture-of-normalizing-flows model yields a good representation of the density of event occurrence.
△ Less
Submitted 16 January, 2023;
originally announced January 2023.
-
Linear $q$-difference, difference and differential operators preserving some $\mathcal{A}$-entire functions
Authors:
Jiaxing Huang,
Tuen Wai Ng
Abstract:
We apply Rossi's half-plane version of Borel's Theorem to study the zero distribution of linear combinations of $\mathcal{A}$-entire functions (Theorem 1.2). This provides a unified way to study linear $q$-difference, difference and differential operators (with entire coefficients) preserving subsets of $\mathcal{A}$-entire functions, and hence obtain several analogous results for the Hermite-Poul…
▽ More
We apply Rossi's half-plane version of Borel's Theorem to study the zero distribution of linear combinations of $\mathcal{A}$-entire functions (Theorem 1.2). This provides a unified way to study linear $q$-difference, difference and differential operators (with entire coefficients) preserving subsets of $\mathcal{A}$-entire functions, and hence obtain several analogous results for the Hermite-Poulain Theorem to linear finite ($q$-)difference operators with polynomial coefficients. The method also produces a result on the existence of infinitely many non-real zeros of some differential polynomials of functions in certain sub-classes of $\mathcal{A}$-entire functions.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Grammatical Error Correction: A Survey of the State of the Art
Authors:
Christopher Bryant,
Zheng Yuan,
Muhammad Reza Qorib,
Hannan Cao,
Hwee Tou Ng,
Ted Briscoe
Abstract:
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, m…
▽ More
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
△ Less
Submitted 29 April, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
A Treatise On FST Lattice Based MMI Training
Authors:
Adnan Haider,
Tim Ng,
Zhen Huang,
Xingyu Na,
Antti Veikko Rosti
Abstract:
Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models. This paper aims to isolate, identify and bring forward the implicit modelling decisions induced by the design implementation of standard finite state transducer (FST) lattice based MMI training framework. The paper particularly investigates the necessity to…
▽ More
Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models. This paper aims to isolate, identify and bring forward the implicit modelling decisions induced by the design implementation of standard finite state transducer (FST) lattice based MMI training framework. The paper particularly investigates the necessity to maintain a preselected numerator alignment and raises the importance of determinizing FST denominator lattices on the fly. The efficacy of employing on the fly FST lattice determinization is mathematically shown to guarantee discrimination at the hypothesis level and is empirically shown through training deep CNN models on a 18K hours Mandarin dataset and on a 2.8K hours English dataset. On assistant and dictation tasks, the approach achieves between 2.3-4.6% relative WER reduction (WERR) over the standard FST lattice based approach.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Unpacking Cultural Perceptions of Future Elder Care through Design Fiction
Authors:
Tse Pei Ng,
Jung-Joo Lee,
Yiying Wu
Abstract:
We present a case using Design Fiction to unpack cultural perceptions of future elder care rooted in the Asian context of Singapore. We created two design fictions, addressing the tensions between filial piety and automated care and the controversy of integrating elder care facilities into residential communities. The design fictions took the visual forms of a shop** web page and a petition site…
▽ More
We present a case using Design Fiction to unpack cultural perceptions of future elder care rooted in the Asian context of Singapore. We created two design fictions, addressing the tensions between filial piety and automated care and the controversy of integrating elder care facilities into residential communities. The design fictions took the visual forms of a shop** web page and a petition site and the public were invited to make fictional decisions. Received in total 109 responses, we identify the key tensions and value conflicts and illustrate them through visual narratives. Further, we propose the Asian perspective of positioning relationships as the protagonist in creating elder care design fiction.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Expressing Multivariate Time Series as Graphs with Time Series Attention Transformer
Authors:
William T. Ng,
K. Siu,
Albert C. Cheung,
Michael K. Ng
Abstract:
A reliable and efficient representation of multivariate time series is crucial in various downstream machine learning tasks. In multivariate time series forecasting, each variable depends on its historical values and there are inter-dependencies among variables as well. Models have to be designed to capture both intra- and inter-relationships among the time series. To move towards this goal, we pr…
▽ More
A reliable and efficient representation of multivariate time series is crucial in various downstream machine learning tasks. In multivariate time series forecasting, each variable depends on its historical values and there are inter-dependencies among variables as well. Models have to be designed to capture both intra- and inter-relationships among the time series. To move towards this goal, we propose the Time Series Attention Transformer (TSAT) for multivariate time series representation learning. Using TSAT, we represent both temporal information and inter-dependencies of multivariate time series in terms of edge-enhanced dynamic graphs. The intra-series correlations are represented by nodes in a dynamic graph; a self-attention mechanism is modified to capture the inter-series correlations by using the super-empirical mode decomposition (SMD) module. We applied the embedded dynamic graphs to times series forecasting problems, including two real-world datasets and two benchmark datasets. Extensive experiments show that TSAT clearly outerperforms six state-of-the-art baseline methods in various forecasting horizons. We further visualize the embedded dynamic graphs to illustrate the graph representation power of TSAT. We share our code at https://github.com/RadiantResearch/TSAT.
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Intrinsic Instabilities in Fermi Glasses
Authors:
Yat Fan Lau,
Tai Kai Ng
Abstract:
We study in this paper the effect of weak, short-ranged interaction on disordered metals. Through analysing the interaction matrix elements between different eigenstates of the non-interacting and corresponding Hartree-Fock single-particle Hamiltonian, we argue that as a result of localized single-particle eigenstates around the Fermi surface, the quasi-particle states on the Fermi surface are uns…
▽ More
We study in this paper the effect of weak, short-ranged interaction on disordered metals. Through analysing the interaction matrix elements between different eigenstates of the non-interacting and corresponding Hartree-Fock single-particle Hamiltonian, we argue that as a result of localized single-particle eigenstates around the Fermi surface, the quasi-particle states on the Fermi surface are unstable towards formation of magnetic moments for arbitrary weak (but finite) repulsive interaction in the thermodynamic limit. This is a mechanism very different from the case of strong interaction $U\sim W_B$ ($W_B=$ bandwidth) or the quantum Griffiths effect where local moments are formed at small localized regions where coupling to the surrounding is weak. Numerical simulations are performed to verify our analysis. We further propose within a Landau Fermi-liquid-type framework that our result is applicable for general electronic systems with weak, short-ranged interaction as long as the quasi-particle states exist and are localized. An analogous result is obtained for attractive interaction, suggesting that Fermi glass state is intrinsically unstable in arbitrary dimension.
△ Less
Submitted 23 June, 2024; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Computational Modelling of Plasticity-Led Evolution
Authors:
Eden Tian Hwa Ng,
Akira R. Kinjo
Abstract:
Plasticity-led evolution is a form of evolution where a change in the environment induces novel traits via phenotypic plasticity, after which the novel traits are genetically accommodated over generations under the novel environment. This mode of evolution is expected to resolve the problem of gradualism (i.e., evolution by the slow accumulation of mutations that induce phenotypic variation) impli…
▽ More
Plasticity-led evolution is a form of evolution where a change in the environment induces novel traits via phenotypic plasticity, after which the novel traits are genetically accommodated over generations under the novel environment. This mode of evolution is expected to resolve the problem of gradualism (i.e., evolution by the slow accumulation of mutations that induce phenotypic variation) implied by the Modern Evolutionary Synthesis, in the face of a large environmental change. While experimental works are essential for validating that plasticity-led evolution indeed happened, we need computational models to gain insight into its underlying mechanisms and make qualitative predictions. Such computational models should include the developmental process and gene-environment interactions in addition to genetics and natural selection. We point out that gene regulatory network models can incorporate all the above notions. In this review, we highlight results from computational modelling of gene regulatory networks that consolidate the criteria of plasticity-led evolution. Since gene regulatory networks are mathematically equivalent to artificial recurrent neural networks, we also discuss their analogies and discrepancies, which may help further understand the mechanisms underlying plasticity-led evolution.
△ Less
Submitted 18 December, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
A Uniqueness Theorem for Holomorphic Map**s in the Disk Sharing Totally Geodesic Hypersurfaces
Authors:
Jiaxing Huang,
Tuen Wai Ng
Abstract:
In this paper, we prove a Second Main Theorem for holomorphic map**s in a disk whose image intersects some families of nonlinear hypersurfaces (totally geodesic hypersurfaces with respect to a meromorphic connection) in the complex projective space $\mathbb{P}^k$. This is a generalization of Cartan's Second Main Theorem. As a consequence, we establish a uniqueness theorem for holomorphic map**…
▽ More
In this paper, we prove a Second Main Theorem for holomorphic map**s in a disk whose image intersects some families of nonlinear hypersurfaces (totally geodesic hypersurfaces with respect to a meromorphic connection) in the complex projective space $\mathbb{P}^k$. This is a generalization of Cartan's Second Main Theorem. As a consequence, we establish a uniqueness theorem for holomorphic map**s which intersects $O(k^3)$ many totally geodesic hypersurfaces.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
SAMPLE-HD: Simultaneous Action and Motion Planning Learning Environment
Authors:
Michal Nazarczuk,
Tony Ng,
Krystian Mikolajczyk
Abstract:
Humans exhibit incredibly high levels of multi-modal understanding - combining visual cues with read, or heard knowledge comes easy to us and allows for very accurate interaction with the surrounding environment. Various simulation environments focus on providing data for tasks related to scene understanding, question answering, space exploration, visual navigation. In this work, we are providing…
▽ More
Humans exhibit incredibly high levels of multi-modal understanding - combining visual cues with read, or heard knowledge comes easy to us and allows for very accurate interaction with the surrounding environment. Various simulation environments focus on providing data for tasks related to scene understanding, question answering, space exploration, visual navigation. In this work, we are providing a solution to encompass both, visual and behavioural aspects of simulation in a new environment for learning interactive reasoning in manipulation setup. SAMPLE-HD environment allows to generate various scenes composed of small household objects, to procedurally generate language instructions for manipulation, and to generate ground truth paths serving as training data.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Assessing Privacy Leakage in Synthetic 3-D PET Imaging using Transversal GAN
Authors:
Robert V. Bergen,
Jean-Francois Rajotte,
Fereshteh Yousefirizi,
Arman Rahmim,
Raymond T. Ng
Abstract:
Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transve…
▽ More
Training computer-vision related algorithms on medical images for disease diagnosis or image segmentation is difficult in large part due to privacy concerns. For this reason, generative image models are highly sought after to facilitate data sharing. However, 3-D generative models are understudied, and investigation of their privacy leakage is needed. We introduce our 3-D generative model, Transversal GAN (TrGAN), using head & neck PET images which are conditioned on tumour masks as a case study. We define quantitative measures of image fidelity, utility and privacy for our model. These metrics are evaluated in the course of training to identify ideal fidelity, utility and privacy trade-offs and establish the relationships between these parameters. We show that the discriminator of the TrGAN is vulnerable to attack, and that an attacker can identify which samples were used in training with almost perfect accuracy (AUC = 0.99). We also show that an attacker with access to only the generator cannot reliably classify whether a sample had been used for training (AUC = 0.51). This suggests that TrGAN generators, but not discriminators, may be used for sharing synthetic 3-D PET data with minimal privacy risk while maintaining good utility and fidelity.
△ Less
Submitted 31 October, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.