-
Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization
Authors:
Miao Li,
Jianzhong Qi,
Jey Han Lau
Abstract:
Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such…
▽ More
Multi-document summarization (MDS) aims to generate a summary for a number of related documents. We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e.g., words and sentences) of the documents. This contrasts with existing MDS models which do not consider different edge types of graphs and as such do not capture the diversity of relationships in the documents. To preserve only key information and relationships of the documents in the heterogeneous graph, HGSUM uses graph pooling to compress the input graph. And to guide HGSUM to learn compression, we introduce an additional objective that maximizes the similarity between the compressed graph and the graph constructed from the ground-truth summary during training. HGSUM is trained end-to-end with graph similarity and standard cross-entropy objectives. Experimental results over MULTI-NEWS, WCEP-100, and ARXIV show that HGSUM outperforms state-of-the-art MDS models. The code for our model and experiments is available at: https://github.com/oaimli/HGSum.
△ Less
Submitted 11 March, 2023;
originally announced March 2023.
-
TBG as Topological Heavy Fermion: II. Analytical approximations of the model parameters
Authors:
Dumitru Călugăru,
Maksim Borovkov,
Liam L. H. Lau,
Piers Coleman,
Zhi-Da Song,
B. Andrei Bernevig
Abstract:
The recently-introduced Topological Heavy Fermion (THF) model [Phys. Rev. Lett. 129, 047601] of twisted bilayer graphene (TBG) aims to reconcile the quantum-dot-like electronic structure of the latter observed by scanning tunneling microscopy, with its electron delocalization seen in transport measurements. The THF model achieves this by coupling localized (heavy) fermions with anomalous conductio…
▽ More
The recently-introduced Topological Heavy Fermion (THF) model [Phys. Rev. Lett. 129, 047601] of twisted bilayer graphene (TBG) aims to reconcile the quantum-dot-like electronic structure of the latter observed by scanning tunneling microscopy, with its electron delocalization seen in transport measurements. The THF model achieves this by coupling localized (heavy) fermions with anomalous conduction electrons. Originally, the parameters of the THF model were obtained numerically from the Bistritzer-Macdonald (BM) model of TBG [Phys. Rev. Lett. 129, 047601]. In this work, we derive analytical expressions for the THF model parameters as a function of the twist angle, the ratio between the tunneling amplitudes at the $AA$ and $AB$ regions ($w_0 / w_1$), and the screening length of the interaction potential. By numerically computing the THF model parameters across an extensive experimentally-relevant parameter space, we show that the resulting approximations are remarkably good, i.e. within the 30% relative error for almost the entire parameter space. At the single-particle level, the THF model accurately captures the energy spectrum of the BM model over a large phase space of angles and tunneling amplitude ratios. When interactions are included, we also show that the THF description of TBG is good around the magic angle for realistic values of the tunneling amplitude ratios ($0.6 \leq w_0/w_1 \leq 1.0$), for which the hybridization between the localized and conduction fermions $γ$ is smaller than the onsite repulsion of the heavy fermions $U_1$ (i.e. $|γ| < U_1$).
△ Less
Submitted 6 July, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Topological Mixed Valence Model for Twisted Bilayer Graphene
Authors:
Liam L. H. Lau,
Piers Coleman
Abstract:
Song and Bernevig (SB) have recently proposed a faithful reformulation of the physics of magic angle twisted bilayer graphene (MATBG) as a topological heavy fermion problem, involving the hybridization of flat band f-electrons with a topological band of conduction electrons. Here we explore the consequences of this analogy, using it to reformulate the SB model as a mixed valence model for twisted…
▽ More
Song and Bernevig (SB) have recently proposed a faithful reformulation of the physics of magic angle twisted bilayer graphene (MATBG) as a topological heavy fermion problem, involving the hybridization of flat band f-electrons with a topological band of conduction electrons. Here we explore the consequences of this analogy, using it to reformulate the SB model as a mixed valence model for twisted bilayer graphene. We show how the interaction with the conduction sea behaves as a U(8) Kondo Lattice at high energies and a U(4) Kondo lattice at low energies. One of the robust consequences of the model, is the prediction that underlying hybridization scale of the mixed valent model and the the width of the upper and lower Hubbard bands will scale linearly with energy. We also find that the bare hybridization $γ_{0}$ predicted by the SB model is too large to account for the observed local moment behavior at large filling factor, leading us to suggest that the bare hybridization is renormalized by the soft polaronic response of the underlying graphene.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
A High Performance Compiler for Very Large Scale Surface Code Computations
Authors:
George Watkins,
Hoang Minh Nguyen,
Keelan Watkins,
Steven Pearce,
Hoi-Kwan Lau,
Alexandru Paler
Abstract:
We present the first high performance compiler for very large scale quantum error correction: it translates an arbitrary quantum circuit to surface code operations based on lattice surgery. Our compiler offers an end to end error correction workflow implemented by a pluggable architecture centered around an intermediate representation of lattice surgery instructions. Moreover, the compiler support…
▽ More
We present the first high performance compiler for very large scale quantum error correction: it translates an arbitrary quantum circuit to surface code operations based on lattice surgery. Our compiler offers an end to end error correction workflow implemented by a pluggable architecture centered around an intermediate representation of lattice surgery instructions. Moreover, the compiler supports customizable circuit layouts, can be used for quantum benchmarking and includes a quantum resource estimator. The compiler can process millions of gates using a streaming pipeline at a speed geared towards real-time operation of a physical device. We compiled within seconds 80 million logical surface code instructions, corresponding to a high precision Clifford+T implementation of the 128-qubit Quantum Fourier Transform (QFT). Our code is open-sourced at \url{https://github.com/latticesurgery-com}.
△ Less
Submitted 16 May, 2024; v1 submitted 5 February, 2023;
originally announced February 2023.
-
$h^1$ boundedness of Localized Operators and Commutators with bmo and lmo
Authors:
Galia Dafni,
Chun Ho Lau
Abstract:
We first consider two types of localizations of singular integral operators of convolution type, and show, under mild decay and smoothness conditions on the auxiliary functions, that their boundedness on the local Hardy space $h^1(\mathbb{R}^n)$ is equivalent. We then study the boundedness on $h^1(\mathbb{R}^n)$ of the commutator $[b,T]$ of an inhomogeneous singular integral operator with $b$ in…
▽ More
We first consider two types of localizations of singular integral operators of convolution type, and show, under mild decay and smoothness conditions on the auxiliary functions, that their boundedness on the local Hardy space $h^1(\mathbb{R}^n)$ is equivalent. We then study the boundedness on $h^1(\mathbb{R}^n)$ of the commutator $[b,T]$ of an inhomogeneous singular integral operator with $b$ in $bmo(\mathbb{R}^n)$, the nonhomogeneous space of functions of bounded mean oscillation. We define local analogues of the atomic space $H^1_b(\mathbb{R}^n)$ introduced by Pérez in the case of the homogeneous Hardy space and $BMO$, including a variation involving atoms with approximate cancellation conditions. For such an atom $a$, we prove integrability of the associated commutator maximal function and of $[b,T](a)$. For $b$ in $lmo(\mathbb{R}^n)$, this gives $h^1$ to $L^1$ boundedness of $[b,T]$. Finally, under additional approximate cancellation conditions on $T$, we show boundedness to $h^1$.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
The Next Chapter: A Study of Large Language Models in Storytelling
Authors:
Zhuohan Xie,
Trevor Cohn,
Jey Han Lau
Abstract:
To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge. The application of prompt-based learning with large language models (LLMs), exemplified by GPT-3, has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensi…
▽ More
To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge. The application of prompt-based learning with large language models (LLMs), exemplified by GPT-3, has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models across three datasets with variations in style, register, and length of stories. The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models. Moreover, they exhibit a level of performance that competes with human authors, albeit with the preliminary observation that they tend to replicate real stories in situations involving world knowledge, resembling a form of plagiarism.
△ Less
Submitted 24 July, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Structure-Informed Shadow Removal Networks
Authors:
Yuhao Liu,
Qing Guo,
Lan Fu,
Zhanghan Ke,
Ke Xu,
Wei Feng,
Ivor W. Tsang,
Rynson W. H. Lau
Abstract:
Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image map** paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence…
▽ More
Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image map** paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence, in this paper, we propose to remove shadows at the image structure level. Based on this idea, we propose a novel structure-informed shadow removal network (StructNet) to leverage the image-structure information to address the shadow remnant problem. Specifically, StructNet first reconstructs the structure information of the input image without shadows and then uses the restored shadow-free structure prior to guiding the image-level shadow removal. StructNet contains two main novel modules: (1) a mask-guided shadow-free extraction (MSFE) module to extract image structural features in a non-shadow-to-shadow directional manner, and (2) a multi-scale feature & residual aggregation (MFRA) module to leverage the shadow-free structure information to regularize feature consistency. In addition, we also propose to extend StructNet to exploit multi-level structure information (MStructNet), to further boost the shadow removal performance with minimum computational overheads. Extensive experiments on three shadow removal benchmarks demonstrate that our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to improve them further.
△ Less
Submitted 1 February, 2024; v1 submitted 9 January, 2023;
originally announced January 2023.
-
Characterization and optimized engineering of bosonic quantum interfaces under single-mode operational constraints
Authors:
Pak-Tik Fong,
Sheung Chi Poon,
Hoi-Kwan Lau
Abstract:
Controlling the quantum interface between two bosonic modes is essential in countless implementations of quantum information processing. However, full controllability is rarely achieved in most platforms due to specific physical limitations. In this work, we completely characterize the linear two-mode interfaces under the most pessimistic restriction that only single-mode operation is available. W…
▽ More
Controlling the quantum interface between two bosonic modes is essential in countless implementations of quantum information processing. However, full controllability is rarely achieved in most platforms due to specific physical limitations. In this work, we completely characterize the linear two-mode interfaces under the most pessimistic restriction that only single-mode operation is available. When arbitrary Gaussian single-mode operations can be applied to both modes, we find that every interface can be characterized by an invariant transmission strength. Moreover, in the practical situation that squeezing is restricted in one of the modes, we discover two additional quantities, irreducible squeezing and irreducible shearing, that are invariant under the allowable controls. By using this characterization, we develop systematic strategies to engineer an arbitrary linear interface through cascading multiple fixed component interfaces. Without squeezing restriction, our protocol is optimal and requires at most three component interfaces. Under the squeezing constraint, our protocol can be extended to engineer also the additional invariants by using no more than two more rounds of cascade. We also propose the remote squeezing scheme to tackle the squeezing restriction through interfacing with an active auxiliary mode.
△ Less
Submitted 26 February, 2024; v1 submitted 9 December, 2022;
originally announced December 2022.
-
The BeMi Stardust: a Structured Ensemble of Binarized Neural Networks
Authors:
Ambrogio Maria Bernardelli,
Stefano Gualandi,
Hoong Chuin Lau,
Simone Milanesi
Abstract:
Binarized Neural Networks (BNNs) are receiving increasing attention due to their lightweight architecture and ability to run on low-power devices. The state-of-the-art for training classification BNNs restricted to few-shot learning is based on a Mixed Integer Programming (MIP) approach. This paper proposes the BeMi ensemble, a structured architecture of BNNs based on training a single BNN for eac…
▽ More
Binarized Neural Networks (BNNs) are receiving increasing attention due to their lightweight architecture and ability to run on low-power devices. The state-of-the-art for training classification BNNs restricted to few-shot learning is based on a Mixed Integer Programming (MIP) approach. This paper proposes the BeMi ensemble, a structured architecture of BNNs based on training a single BNN for each possible pair of classes and applying a majority voting scheme to predict the final output. The training of a single BNN discriminating between two classes is achieved by a MIP model that optimizes a lexicographic multi-objective function according to robustness and simplicity principles. This approach results in training networks whose output is not affected by small perturbations on the input and whose number of active weights is as small as possible, while good accuracy is preserved. We computationally validate our model using the MNIST and Fashion-MNIST datasets using up to 40 training images per class. Our structured ensemble outperforms both BNNs trained by stochastic gradient descent and state-of-the-art MIP-based approaches. While the previous approaches achieve an average accuracy of 51.1% on the MNIST dataset, the BeMi ensemble achieves an average accuracy of 61.7% when trained with 10 images per class and 76.4% when trained with 40 images per class.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Efficient Mirror Detection via Multi-level Heterogeneous Learning
Authors:
Ruozhen He,
Jiaying Lin,
Rynson W. H. Lau
Abstract:
We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between diffe…
▽ More
We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between different levels of features. In contrast, HetNet detects potential mirror regions initially through low-level understandings (\textit{e.g.}, intensity contrasts) and then combines with high-level understandings (contextual discontinuity for instance) to finalize the predictions. To perform accurate yet efficient mirror detection, HetNet follows an effective architecture that obtains specific information at different stages to detect mirrors. We further propose a multi-orientation intensity-based contrasted module (MIC) and a reflection semantic logical module (RSL), equipped on HetNet, to predict potential mirror regions by low-level understandings and analyze semantic logic in scenarios by high-level understandings, respectively. Compared to the state-of-the-art method, HetNet runs 664$\%$ faster and draws an average performance gain of 8.9$\%$ on MAE, 3.1$\%$ on IoU, and 2.0$\%$ on F-measure on two mirror detection benchmarks.
△ Less
Submitted 28 November, 2022;
originally announced November 2022.
-
Rapid Formation of Massive Planetary Cores in a Pressure Bump
Authors:
Tommy Chi Ho Lau,
Joanna Drążkowska,
Sebastian M. Stammler,
Tilman Birnstiel,
Cornelis P. Dullemond
Abstract:
Models of planetary core growth by either planetesimal or pebble accretion are traditionally disconnected from the models of dust evolution and formation of the first gravitationally-bound planetesimals. The state-of-the-art models typically start with massive planetary cores already present. We aim to study the formation and growth of planetary cores in a pressure bump, motivated by the annular s…
▽ More
Models of planetary core growth by either planetesimal or pebble accretion are traditionally disconnected from the models of dust evolution and formation of the first gravitationally-bound planetesimals. The state-of-the-art models typically start with massive planetary cores already present. We aim to study the formation and growth of planetary cores in a pressure bump, motivated by the annular structures observed in protoplanetary disks, starting with sub-micron-sized dust grains. We connect the models of dust coagulation and drift, planetesimal formation in the streaming instability, gravitational interactions between planetesimals, pebble accretion, and planet migration, into one uniform framework. We find that planetesimals forming early at the massive end of the size distribution grow quickly dominantly by pebble accretion. These few massive bodies grow on the timescales of ~100 000 years and stir the planetesimals formed later preventing the emergence of further planetary cores. Additionally, a migration trap occurs allowing for retention of the growing cores. Pressure bumps are favourable locations for the emergence and rapid growth of planetary cores by pebble accretion as the dust density and grain size are increased and the pebble accretion onset mass is reduced compared to a smooth-disk model.
△ Less
Submitted 21 December, 2022; v1 submitted 8 November, 2022;
originally announced November 2022.
-
Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning
Authors:
Robbert Reijnen,
Yingqian Zhang,
Hoong Chuin Lau,
Zaharah Bukhsh
Abstract:
The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach c…
▽ More
The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants.
△ Less
Submitted 3 April, 2024; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Necessary cancellation conditions for the boundedness of operators on local Hardy spaces
Authors:
Galia Dafni,
Chun Ho Lau,
Tiago Picon,
Claudio Vasconcelos
Abstract:
In this work we present necessary cancellation conditions for the continuity of linear operators in $h^p(\mathbb{R}^n)$, $0<p\leq 1$, that map atoms into pseudo-molecules. Our necessary condition, expressed in terms of the $T^{\ast}$ condition, is the same as the one recently proved sufficient in [3], thus providing a necessary and sufficient cancellation condition for the boundedness of inhomogen…
▽ More
In this work we present necessary cancellation conditions for the continuity of linear operators in $h^p(\mathbb{R}^n)$, $0<p\leq 1$, that map atoms into pseudo-molecules. Our necessary condition, expressed in terms of the $T^{\ast}$ condition, is the same as the one recently proved sufficient in [3], thus providing a necessary and sufficient cancellation condition for the boundedness of inhomogeneous Calderón--Zygmund type operators
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation
Authors:
Thinh Hung Truong,
Yulia Otmakhova,
Timothy Baldwin,
Trevor Cohn,
Jey Han Lau,
Karin Verspoor
Abstract:
Negation is poorly captured by current language models, although the extent of this problem is not widely understood. We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods, with the aim of understanding sub-clausal negation. The test suite contains premise--hypothesis pairs where the premise contains sub-clausal negation and the hypothesis is…
▽ More
Negation is poorly captured by current language models, although the extent of this problem is not widely understood. We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods, with the aim of understanding sub-clausal negation. The test suite contains premise--hypothesis pairs where the premise contains sub-clausal negation and the hypothesis is constructed by making minimal modifications to the premise in order to reflect different possible interpretations. Aside from adopting standard NLI labels, our test suite is systematically constructed under a rigorous linguistic framework. It includes annotation of negation types and constructions grounded in linguistic theory, as well as the operations used to construct hypotheses. This facilitates fine-grained analysis of model performance. We conduct experiments using pre-trained language models to demonstrate that our test suite is more challenging than existing benchmarks focused on negation, and show how our annotation supports a deeper understanding of the current NLI capabilities in terms of negation and quantification.
△ Less
Submitted 13 October, 2022; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective
Authors:
Zijian Zhang,
Chang Shu,
Ya Xiao,
Yuan Shen,
Di Zhu,
**g Xiao,
Youxin Chen,
Jey Han Lau,
Qian Zhang,
Zheng Lu
Abstract:
Visual-Semantic Embedding (VSE) aims to learn an embedding space where related visual and semantic instances are close to each other. Recent VSE models tend to design complex structures to pool visual and semantic features into fixed-length vectors and use hard triplet loss for optimization. However, we find that: (1) combining simple pooling methods is no worse than these sophisticated methods; a…
▽ More
Visual-Semantic Embedding (VSE) aims to learn an embedding space where related visual and semantic instances are close to each other. Recent VSE models tend to design complex structures to pool visual and semantic features into fixed-length vectors and use hard triplet loss for optimization. However, we find that: (1) combining simple pooling methods is no worse than these sophisticated methods; and (2) only considering the most difficult-to-distinguish negative sample leads to slow convergence and poor Recall@K improvement. To this end, we propose an adaptive pooling strategy that allows the model to learn how to aggregate features through a combination of simple pooling methods. We also introduce a strategy to dynamically select a group of negative samples to make the optimization converge faster and perform better. Experimental results on Flickr30K and MS-COCO demonstrate that a standard VSE using our pooling and optimization strategies outperforms current state-of-the-art systems (at least 1.0% on the metrics of recall) in image-to-text and text-to-image retrieval. Source code of our experiments is available at https://github.com/96-Zachary/vse_2ad.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training
Authors:
Tianyu Huang,
Bowen Dong,
Yunhan Yang,
Xiaoshui Huang,
Rynson W. H. Lau,
Wanli Ouyang,
Wangmeng Zuo
Abstract:
Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the…
▽ More
Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the diversity of depth distributions. To address this issue, we propose CLIP2Point, an image-depth pre-training method by contrastive learning to transfer CLIP to the 3D domain, and adapt it to point cloud classification. We introduce a new depth rendering setting that forms a better visual effect, and then render 52,460 pairs of images and depth maps from ShapeNet for pre-training. The pre-training scheme of CLIP2Point combines cross-modality learning to enforce the depth features for capturing expressive visual and textual features and intra-modality learning to enhance the invariance of depth aggregation. Additionally, we propose a novel Dual-Path Adapter (DPA) module, i.e., a dual-path structure with simplified adapters for few-shot learning. The dual-path structure allows the joint use of CLIP and CLIP2Point, and the simplified adapter can well fit few-shot tasks without post-search. Experimental results show that CLIP2Point is effective in transferring CLIP knowledge to 3D vision. Our CLIP2Point outperforms PointCLIP and other self-supervised 3D networks, achieving state-of-the-art results on zero-shot and few-shot classification.
△ Less
Submitted 22 August, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Quantum chaos, scrambling and operator growth in $T\bar{T}$ deformed SYK models
Authors:
Song He,
Pak Hang Chris Lau,
Zhuo-Yu Xian,
Long Zhao
Abstract:
In this work, we investigate the quantum chaos in various $T\bar{T}$-deformed SYK models with finite $N$, including the SYK$_4$, the supersymmetric SYK$_4$, and the SYK$_2$ models. We numerically study the evolution of the spectral form factor (SFF), the out-of-time ordered correlator (OTOC), and the Krylov complexity. We find that the characteristic evolution of the SFF, OTOC and K-complexity of…
▽ More
In this work, we investigate the quantum chaos in various $T\bar{T}$-deformed SYK models with finite $N$, including the SYK$_4$, the supersymmetric SYK$_4$, and the SYK$_2$ models. We numerically study the evolution of the spectral form factor (SFF), the out-of-time ordered correlator (OTOC), and the Krylov complexity. We find that the characteristic evolution of the SFF, OTOC and K-complexity of both the SYK$_4$ and SSYK$_4$ models remains unchanged under the deformation, which implies that the properties of quantum chaos is preserved. We also identify a many-body localization behavior in the deformed SYK$_2$ model.
△ Less
Submitted 20 November, 2022; v1 submitted 29 September, 2022;
originally announced September 2022.
-
LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation
Authors:
Yulia Otmakhova,
Hung Thinh Truong,
Timothy Baldwin,
Trevor Cohn,
Karin Verspoor,
Jey Han Lau
Abstract:
In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task. Specifically, we adapt PRIMERA (Xiao et al., 2022) to the biomedical domain by placing global attention on important biomedical entities in several ways. We analyse the outputs of the 23 resulting models, and report patterns in the results related to the presence of additional glo…
▽ More
In this paper we report on our submission to the Multidocument Summarisation for Literature Review (MSLR) shared task. Specifically, we adapt PRIMERA (Xiao et al., 2022) to the biomedical domain by placing global attention on important biomedical entities in several ways. We analyse the outputs of the 23 resulting models, and report patterns in the results related to the presence of additional global attention, number of training steps, and the input configuration.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Unsupervised Lexical Substitution with Decontextualised Embeddings
Authors:
Takashi Wada,
Timothy Baldwin,
Yuji Matsumoto,
Jey Han Lau
Abstract:
We propose a new unsupervised method for lexical substitution using pre-trained language models. Compared to previous approaches that use the generative capability of language models to predict substitutes, our method retrieves substitutes based on the similarity of contextualised and decontextualised word embeddings, i.e. the average contextual representation of a word in multiple contexts. We co…
▽ More
We propose a new unsupervised method for lexical substitution using pre-trained language models. Compared to previous approaches that use the generative capability of language models to predict substitutes, our method retrieves substitutes based on the similarity of contextualised and decontextualised word embeddings, i.e. the average contextual representation of a word in multiple contexts. We conduct experiments in English and Italian, and show that our method substantially outperforms strong baselines and establishes a new state-of-the-art without any explicit supervision or fine-tuning. We further show that our method performs particularly well at predicting low-frequency substitutes, and also generates a diverse list of substitute candidates, reducing morphophonetic or morphosyntactic biases induced by article-noun agreement.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Large-Field Contextual Feature Learning for Glass Detection
Authors:
Haiyang Mei,
Xin Yang,
Letian Yu,
Qiang Zhang,
Xiaopeng Wei,
Rynson W. H. Lau
Abstract:
Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RG…
▽ More
Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RGB image. To address this problem, we construct the first large-scale glass detection dataset (GDD) and propose a novel glass detection network, called GDNet-B, which explores abundant contextual cues in a large field-of-view via a novel large-field contextual feature integration (LCFI) module and integrates both high-level and low-level boundary features with a boundary feature enhancement (BFE) module. Extensive experiments demonstrate that our GDNet-B achieves satisfying glass detection results on the images within and beyond the GDD testing set. We further validate the effectiveness and generalization capability of our proposed GDNet-B by applying it to other vision tasks, including mirror segmentation and salient object detection. Finally, we show the potential applications of glass detection and discuss possible future research directions.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
"Task-relevant autoencoding" enhances machine learning for human neuroscience
Authors:
Seyedmehdi Orouji,
Vincent Taschereau-Dumouchel,
Aurelio Cortese,
Brian Odegaard,
Cody Cushing,
Mouslim Cherkaoui,
Mitsuo Kawato,
Hakwan Lau,
Megan A. K. Peters
Abstract:
In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience a…
▽ More
In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience are precisely those relevant to subjects' behavior. We thus developed a Task-Relevant Autoencoder via Classifier Enhancement (TRACE), and tested its ability to extract behaviorally-relevant, separable representations compared to a standard autoencoder, a variational autoencoder, and principal component analysis for two severely truncated machine learning datasets. We then evaluated all models on fMRI data from 59 subjects who observed animals and objects. TRACE outperformed all models nearly unilaterally, showing up to 12% increased classification accuracy and up to 56% improvement in discovering "cleaner", task-relevant representations. These results showcase TRACE's potential for a wide variety of data related to human behavior.
△ Less
Submitted 22 September, 2023; v1 submitted 17 August, 2022;
originally announced August 2022.
-
Rain Removal from Light Field Images with 4D Convolution and Multi-scale Gaussian Process
Authors:
Tao Yan,
Mingyue Li,
Bin Li,
Yang Yang,
Rynson W. H. Lau
Abstract:
Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plen…
▽ More
Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plenoptic camera. LFIs are becoming popular in the computer vision and graphics communities. However, making full use of the abundant information available from LFIs, such as 2D array of sub-views and the disparity map of each sub-view, for effective rain removal is still a challenging problem. In this paper, we propose a novel method, 4D-MGP-SRRNet, for rain streak removal from LFIs. Our method takes as input all sub-views of a rainy LFI. To make full use of the LFI, it adopts 4D convolutional layers to simultaneously process all sub-views of the LFI. In the pipeline, the rain detection network, MGPDNet, with a novel Multi-scale Self-guided Gaussian Process (MSGP) module is proposed to detect high-resolution rain streaks from all sub-views of the input LFI at multi-scales. Semi-supervised learning is introduced for MSGP to accurately detect rain streaks by training on both virtual-world rainy LFIs and real-world rainy LFIs at multi-scales via computing pseudo ground truths for real-world rain streaks. We then feed all sub-views subtracting the predicted rain streaks into a 4D convolution-based Depth Estimation Residual Network (DERNet) to estimate the depth maps, which are later converted into fog maps. Finally, all sub-views concatenated with the corresponding rain streaks and fog maps are fed into a powerful rainy LFI restoring model based on the adversarial recurrent neural network to progressively eliminate rain streaks and recover the rain-free LFI.
△ Less
Submitted 27 January, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
Efficient in-situ generation of photon-memory entanglement in a nonlinear cavity
Authors:
Hoi-Kwan Lau,
Hong Qiao,
Aashish A. Clerk,
Tian Zhong
Abstract:
Combining parametric driving and photon-atomic memory coupling within one optical cavity, we describe a scheme for in-situ generation of multimode photon-memory entanglement. We find that precise cavity impedance matching is neither required nor optimal to achieve high-rate entanglement for quantum networks. This protocol can be realized with existing technologies based on on-chip photonic cavitie…
▽ More
Combining parametric driving and photon-atomic memory coupling within one optical cavity, we describe a scheme for in-situ generation of multimode photon-memory entanglement. We find that precise cavity impedance matching is neither required nor optimal to achieve high-rate entanglement for quantum networks. This protocol can be realized with existing technologies based on on-chip photonic cavities integrated with a rare-earth-ion doped quantum memory. The proposed scheme shows significant advantages in entanglement generation rates compared with start-of-the-art quantum memory protocols and experiments, with predicted Ebit generation rates of tens of MHz without ideal operating conditions. Such a photon-memory entanglement system offers a versatile resource for quantum interconnect applications.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Weakly-Supervised Camouflaged Object Detection with Scribble Annotations
Authors:
Ruozhen He,
Qihua Dong,
Jiaying Lin,
Rynson W. H. Lau
Abstract:
Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and labor-intensive, taking ~60mins to label one image. In this paper, we propose the first weakly-supervised COD method, using scribble annotations as supervision. To achieve…
▽ More
Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and labor-intensive, taking ~60mins to label one image. In this paper, we propose the first weakly-supervised COD method, using scribble annotations as supervision. To achieve this, we first relabel 4,040 images in existing camouflaged object datasets with scribbles, which takes ~10s to label one image. As scribble annotations only describe the primary structure of objects without details, for the network to learn to localize the boundaries of camouflaged objects, we propose a novel consistency loss composed of two parts: a cross-view loss to attain reliable consistency over different images, and an inside-view loss to maintain consistency inside a single prediction map. Besides, we observe that humans use semantic information to segment regions near the boundaries of camouflaged objects. Hence, we further propose a feature-guided loss, which includes visual features directly extracted from images and semantically significant features captured by the model. Finally, we propose a novel network for COD via scribble learning on structural information and semantic relations. Our network has two novel modules: the local-context contrasted (LCC) module, which mimics visual inhibition to enhance image contrast/sharpness and expand the scribbles into potential camouflaged regions, and the logical semantic relation (LSR) module, which analyzes the semantic relation to determine the regions representing the camouflaged object. Experimental results show that our model outperforms relevant SOTA methods on three COD benchmarks with an average improvement of 11.0% on MAE, 3.2% on S-measure, 2.5% on E-measure, and 4.4% on weighted F-measure.
△ Less
Submitted 28 November, 2022; v1 submitted 28 July, 2022;
originally announced July 2022.
-
A Novel ECG Denoising Scheme Using the Ensemble Kalman Filter
Authors:
Sadaf Sarafan,
Hoang Vuong,
Daniel Jilani,
Samir Malhotra,
Michael P. H. Lau,
Manoj Vishwanath,
Tadesse Ghirmai,
Hung Cao
Abstract:
Monitoring of electrocardiogram (ECG) provides vital information as well as any cardiovascular anomalies. Recent advances in the technology of wearable electronics have enabled compact devices to acquire personal physiological signals in the home setting; however, signals are usually contaminated with high level noise. Thus, an efficient ECG filtering scheme is a dire need. In this paper, a novel…
▽ More
Monitoring of electrocardiogram (ECG) provides vital information as well as any cardiovascular anomalies. Recent advances in the technology of wearable electronics have enabled compact devices to acquire personal physiological signals in the home setting; however, signals are usually contaminated with high level noise. Thus, an efficient ECG filtering scheme is a dire need. In this paper, a novel method using Ensemble Kalman Filter (EnKF) is developed for denoising ECG signals. We also intensively explore various filtering algorithms, including Savitzky-Golay (SG) filter, Ensemble Empirical mode decomposition (EEMD), Normalized Least-Mean-Square (NLMS), Recursive least squares (RLS) filter, Total variation denoising (TVD), Wavelet and extended Kalman filter (EKF) for comparison. Data from the MIT-BIH Noise Stress Test database were used. The proposed methodology shows the average signal to noise ratio (SNR) of 10.96, the Percentage Root Difference of 150.45, and the correlation coefficient of 0.959 from the modified MIT-BIH database with added motion artifacts.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
Single-Pixel Image Reconstruction Based on Block Compressive Sensing and Deep Learning
Authors:
Stephen L. H. Lau,
Edwin K. P. Chong
Abstract:
Single-pixel imaging (SPI) is a novel imaging technique whose working principle is based on the compressive sensing (CS) theory. In SPI, data is obtained through a series of compressive measurements and the corresponding image is reconstructed. Typically, the reconstruction algorithm such as basis pursuit relies on the sparsity assumption in images. However, recent advances in deep learning have f…
▽ More
Single-pixel imaging (SPI) is a novel imaging technique whose working principle is based on the compressive sensing (CS) theory. In SPI, data is obtained through a series of compressive measurements and the corresponding image is reconstructed. Typically, the reconstruction algorithm such as basis pursuit relies on the sparsity assumption in images. However, recent advances in deep learning have found its uses in reconstructing CS images. Despite showing a promising result in simulations, it is often unclear how such an algorithm can be implemented in an actual SPI setup. In this paper, we demonstrate the use of deep learning on the reconstruction of SPI images in conjunction with block compressive sensing (BCS). We also proposed a novel reconstruction model based on convolutional neural networks that outperforms other competitive CS reconstruction algorithms. Besides, by incorporating BCS in our deep learning model, we were able to reconstruct images of any size above a certain smallest image size. In addition, we show that our model is capable of reconstructing images obtained from an SPI setup while being priorly trained on natural images, which can be vastly different from the SPI images. This opens up opportunity for the feasibility of pretrained deep learning models for CS reconstructions of images from various domain areas.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Symmetry-Aware Transformer-based Mirror Detection
Authors:
Tianyu Huang,
Bowen Dong,
Jiaying Lin,
Xiaohui Liu,
Rynson W. H. Lau,
Wangmeng Zuo
Abstract:
Mirror detection aims to identify the mirror regions in the given input image. Existing works mainly focus on integrating the semantic features and structural features to mine specific relations between mirror and non-mirror regions, or introducing mirror properties like depth or chirality to help analyze the existence of mirrors. In this work, we observe that a real object typically forms a loose…
▽ More
Mirror detection aims to identify the mirror regions in the given input image. Existing works mainly focus on integrating the semantic features and structural features to mine specific relations between mirror and non-mirror regions, or introducing mirror properties like depth or chirality to help analyze the existence of mirrors. In this work, we observe that a real object typically forms a loose symmetry relationship with its corresponding reflection in the mirror, which is beneficial in distinguishing mirrors from real objects. Based on this observation, we propose a dual-path Symmetry-Aware Transformer-based mirror detection Network (SATNet), which includes two novel modules: Symmetry-Aware Attention Module (SAAM) and Contrast and Fusion Decoder Module (CFDM). Specifically, we first adopt a transformer backbone to model global information aggregation in images, extracting multi-scale features in two paths. We then feed the high-level dual-path features to SAAMs to capture the symmetry relations. Finally, we fuse the dual-path features and refine our prediction maps progressively with CFDMs to obtain the final mirror mask. Experimental results show that SATNet outperforms both RGB and RGB-D mirror detection methods on all available mirror detection datasets. Codes and trained models are available at: https://github.com/tyhuang0428/SATNet.
△ Less
Submitted 4 September, 2022; v1 submitted 13 July, 2022;
originally announced July 2022.
-
Harmonizer: Learning to Perform White-Box Image and Video Harmonization
Authors:
Zhanghan Ke,
Chunyi Sun,
Lei Zhu,
Ke Xu,
Rynson W. H. Lau
Abstract:
Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the…
▽ More
Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. Hence, we frame image harmonization as an image-level regression problem to learn the arguments of the filters that humans use for the task. We present a Harmonizer framework for image harmonization. Unlike prior methods that are based on black-box autoencoders, Harmonizer contains a neural network for filter argument prediction and several white-box filters (based on the predicted arguments) for image harmonization. We also introduce a cascade regressor and a dynamic loss strategy for Harmonizer to learn filter arguments more stably and precisely. Since our network only outputs image-level arguments and the filters we used are efficient, Harmonizer is much lighter and faster than existing methods. Comprehensive experiments demonstrate that Harmonizer surpasses existing methods notably, especially with high-resolution inputs. Finally, we apply Harmonizer to video harmonization, which achieves consistent results across frames and 56 fps at 1080P resolution. Code and models are available at: https://github.com/ZHKKKe/Harmonizer.
△ Less
Submitted 20 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Depth-aware Glass Surface Detection with Cross-modal Context Mining
Authors:
Jiaying Lin,
Yuen Hei Yeung,
Rynson W. H. Lau
Abstract:
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflection…
▽ More
Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflections, as a prior. However, they are all based on input RGB images.We observe that the transmission of 3D depth sensor light through glass surfaces often produces blank regions in the depth maps, which can offer additional insights to complement the RGB image features for glass surface detection. In this paper, we propose a novel framework for glass surface detection by incorporating RGB-D information, with two novel modules: (1) a cross-modal context mining (CCM) module to adaptively learn individual and mutual context features from RGB and depth information, and (2) a depth-missing aware attention (DAA) module to explicitly exploit spatial locations where missing depths occur to help detect the presence of glass surfaces. In addition, we propose a large-scale RGB-D glass surface detection dataset, called \textit{RGB-D GSD}, for RGB-D glass surface detection. Our dataset comprises 3,009 real-world RGB-D glass surface images with precise annotations. Extensive experimental results show that our proposed model outperforms state-of-the-art methods.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Page curve and symmetries
Authors:
Pak Hang Chris Lau,
Toshifumi Noumi,
Yuhei Takii,
Kotaro Tamaoka
Abstract:
Motivated by the quantum process of black hole evaporation and its implications for symmetries, we consider a qubit system with a random dynamics as a toy model of black hole. We compute its symmetry-resolved entropies and discuss its implications. We first consider the case where charges are conserved and compute the symmetry-resolved entropies. We derive a symmetry-resolved analogue of the Page…
▽ More
Motivated by the quantum process of black hole evaporation and its implications for symmetries, we consider a qubit system with a random dynamics as a toy model of black hole. We compute its symmetry-resolved entropies and discuss its implications. We first consider the case where charges are conserved and compute the symmetry-resolved entropies. We derive a symmetry-resolved analogue of the Page curve. We then consider the case where symmetry is explicitly broken and charges are no longer conserved. It serves as a toy model for global symmetry breaking in black hole evaporation. Despite the simple framework, the symmetry-resolved entropies capture various interesting features during the analogous process of black hole evaporation in our qubit model.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Periodic measures for a class of SPDEs with regime-switching
Authors:
Chun Ho Lau,
Wei Sun
Abstract:
We use the variational approach to investigate periodic measures for a class of SPDEs with regime-switching. The hybrid system is driven by degenerate Lévy noise. We use the Lyapunov function method to study the existence of periodic measures and show the uniqueness of periodic measures by establishing the strong Feller property and irreducibility of the associated time-inhomogeneous semigroup. Th…
▽ More
We use the variational approach to investigate periodic measures for a class of SPDEs with regime-switching. The hybrid system is driven by degenerate Lévy noise. We use the Lyapunov function method to study the existence of periodic measures and show the uniqueness of periodic measures by establishing the strong Feller property and irreducibility of the associated time-inhomogeneous semigroup. The main results are applied to stochastic porous media equations with regime-switching.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Authors:
Genta Indra Winata,
Alham Fikri Aji,
Samuel Cahyawijaya,
Rahmad Mahendra,
Fajri Koto,
Ade Romadhony,
Kemal Kurniawan,
David Moeljadi,
Radityo Eko Prasojo,
Pascale Fung,
Timothy Baldwin,
Jey Han Lau,
Rico Sennrich,
Sebastian Ruder
Abstract:
Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on develo** re…
▽ More
Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on develo** resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages.
△ Less
Submitted 12 April, 2023; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Robust Task-Oriented Dialogue Generation with Contrastive Pre-training and Adversarial Filtering
Authors:
Shiquan Yang,
Xinting Huang,
Jey Han Lau,
Sarah Erfani
Abstract:
Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks. In this paper, we focus on task-oriented dialogue and investigate whether popular datasets s…
▽ More
Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks. In this paper, we focus on task-oriented dialogue and investigate whether popular datasets such as MultiWOZ contain such data artifacts. We found that by only kee** frequent phrases in the training examples, state-of-the-art models perform similarly compared to the variant trained with full data, suggesting they exploit these spurious correlations to solve the task. Motivated by this, we propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns. We also experiment with adversarial filtering to remove "easy" training instances so that the model would focus on learning from the "harder" instances. We conduct a number of generalization experiments -- e.g., cross-domain/dataset and adversarial tests -- to assess the robustness of our approach and found that it works exceptionally well.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Reliability of Robotic Ultrasound Scanning for Scoliosis Assessment in Comparison with Manual Scanning
Authors:
Maria Victorova,
Heidi Hin Ting Lau,
Timothy Tin-Yan Lee,
David Navarro-Alarcon,
Yong** Zheng
Abstract:
Background: Ultrasound (US) imaging for scoliosis assessment is challenging for a non-experienced operator. The robotic scanning was developed to follow a spinal curvature with deep learning and apply consistent forces to the patient' back. Methods: 23 scoliosis patients were scanned with US devices both, robotically and manually. Two human raters measured each subject's spinous process angles (SP…
▽ More
Background: Ultrasound (US) imaging for scoliosis assessment is challenging for a non-experienced operator. The robotic scanning was developed to follow a spinal curvature with deep learning and apply consistent forces to the patient' back. Methods: 23 scoliosis patients were scanned with US devices both, robotically and manually. Two human raters measured each subject's spinous process angles (SPA) on robotic and manual coronal images. Results: The robotic method showed high intra- (ICC > 0.85) and inter-rater (ICC > 0.77) reliabilities. Compared with the manual method, the robotic approach showed no significant difference (p < 0.05) when measuring coronal deformity angles. The MAD for intra-rater analysis lies within an acceptable range from 0 deg to 5 deg for a minimum of 86% and a maximum 97% of a total number of the measured angles. Conclusions: This study demonstrated that scoliosis deformity angles measured on ultrasound images obtained with robotic scanning are comparable to those obtained by manual scanning.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
Emergence of Time from Unitary Equivalence
Authors:
Pak Hang Chris Lau,
Chen-Te Ma
Abstract:
We discuss the concept of unitary equivalence $\hat{H}\sim\hat{U}^{\dagger}\hat{H}_{\mathrm{mod}}\hat{U}$ between the modular Hamiltonian $\hat{H}_{\mathrm{mod}}$ and the subsystem Hamiltonian $\hat{H}$ in the context of realizing the emergence of time through a unitary operator $\hat{U}$. This concept suggests a duality between the modular flow and time evolution. Additionally, requiring unitary…
▽ More
We discuss the concept of unitary equivalence $\hat{H}\sim\hat{U}^{\dagger}\hat{H}_{\mathrm{mod}}\hat{U}$ between the modular Hamiltonian $\hat{H}_{\mathrm{mod}}$ and the subsystem Hamiltonian $\hat{H}$ in the context of realizing the emergence of time through a unitary operator $\hat{U}$. This concept suggests a duality between the modular flow and time evolution. Additionally, requiring unitary equivalence implies a connection between the "Modular Chaos Bound" and the "Chaos Bound". Furthermore, we demonstrate this duality using quantum chaos diagnostic quantities in the thermofield double state of a fermionic system. Quantum chaos diagnostic quantities are mathematical measures that characterize chaotic behavior in quantum systems. By examining these quantities in the thermofield double state, we illustrate the duality between them and the modular Hamiltonian. We show a specific duality between correlators, the spectral form factor, and the Loschmidt echo with the modular Hamiltonian. The spectral form factor is a quantity that provides information about the energy spectrum of a quantum system, while the Loschmidt echo characterizes the sensitivity of a system's modular time evolution to perturbations. Finally, we demonstrate that a different entanglement spectrum does not impose the same constraint on the subsystem Hamiltonian. The entanglement spectrum is related to entanglement entropy and provides information about the eigenvalues of the reduced density matrix associated with a subsystem. We discuss complex concepts related to the interplay between quantum chaos, time emergence, and the relationship between modular and subsystem Hamiltonians. These ideas are part of ongoing research in quantum information theory and related fields.
△ Less
Submitted 24 July, 2023; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Rethinking Video Salient Object Ranking
Authors:
Jiaying Lin,
Huankang Guan,
Rynson W. H. Lau
Abstract:
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency rank…
▽ More
Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency ranking. In this work, we propose to explicitly learn the spatial and temporal relations between different salient objects to produce the saliency ranks. To this end, we propose an end-to-end method for video salient object ranking (VSOR), with two novel modules: an intra-frame adaptive relation (IAR) module to learn the spatial relation among the salient objects in the same frame locally and globally, and an inter-frame dynamic relation (IDR) module to model the temporal relation of saliency across different frames. In addition, to address the limited video types (just sports and movies) and scene diversity in the existing VSOR dataset, we propose a new dataset that covers different video types and diverse scenes on a large scale. Experimental results demonstrate that our method outperforms state-of-the-art methods in relevant fields. We will make the source code and our proposed dataset available.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Optimal reinsurance design under solvency constraints
Authors:
Benjamin Avanzi,
Hayden Lau,
Mogens Steffensen
Abstract:
We consider the optimal risk transfer from an insurance company to a reinsurer. The problem formulation considered in this paper is closely connected to the optimal portfolio problem in finance, with some crucial distinctions. In particular, the insurance company's surplus is here (as is routinely the case) approximated by a Brownian motion, as opposed to the geometric Brownian motion used to mode…
▽ More
We consider the optimal risk transfer from an insurance company to a reinsurer. The problem formulation considered in this paper is closely connected to the optimal portfolio problem in finance, with some crucial distinctions. In particular, the insurance company's surplus is here (as is routinely the case) approximated by a Brownian motion, as opposed to the geometric Brownian motion used to model assets in finance. Furthermore, risk exposure is dialled "down" via reinsurance, rather than "up" via risky investments. This leads to interesting qualitative differences in the optimal designs.
In this paper, using the martingale method, we derive the optimal design as a function of proportional, non-cheap reinsurance design that maximises the quadratic utility of the terminal value of the insurance surplus. We also consider several realistic constraints on the terminal value: a strict lower boundary, the probability (Value at Risk) constraint, and the expected shortfall (conditional Value at Risk) constraints under the $\mathbb{P}$ and $\mathbb{Q}$ measures, respectively. In all cases, the optimal reinsurance designs boil down to a combination of proportional protection and option-like protection (stop-loss) of the residual proportion with various deductibles. Proportions and deductibles are set such that the initial capital is fully allocated. Comparison of the optimal designs with the optimal portfolios in finance is particularly interesting. Results are illustrated.
△ Less
Submitted 22 June, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Authors:
Alham Fikri Aji,
Genta Indra Winata,
Fajri Koto,
Samuel Cahyawijaya,
Ade Romadhony,
Rahmad Mahendra,
Kemal Kurniawan,
David Moeljadi,
Radityo Eko Prasojo,
Timothy Baldwin,
Jey Han Lau,
Sebastian Ruder
Abstract:
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia's 700+ languages. We highlight challenges in Indonesian N…
▽ More
NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects. Focusing on the languages spoken in Indonesia, the second most linguistically diverse and the fourth most populous nation of the world, we provide an overview of the current state of NLP research for Indonesia's 700+ languages. We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems. Finally, we provide general recommendations to help develop NLP technology not only for languages of Indonesia but also other underrepresented languages.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
Bi-directional Object-context Prioritization Learning for Saliency Ranking
Authors:
Xin Tian,
Ke Xu,
Xin Yang,
Lin Du,
Baocai Yin,
Rynson W. H. Lau
Abstract:
The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with str…
▽ More
The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with strong semantics (e.g., humans), resulting in unrealistic saliency ranking. We observe that spatial attention works concurrently with object-based attention in the human visual recognition system. During the recognition process, the human spatial attention mechanism would move, engage, and disengage from region to region (i.e., context to context). This inspires us to model the region-level interactions, in addition to the object-level reasoning, for saliency ranking. To this end, we propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking. Our model includes two novel modules: (1) a selective object saliency (SOS) module that models objectbased attention via inferring the semantic representation of the salient object, and (2) an object-context-object relation (OCOR) module that allocates saliency ranks to objects by jointly modeling the object-context and context-object interactions of the salient objects. Extensive experiments show that our approach outperforms existing state-of-theart methods. Our code and pretrained model are available at https://github.com/GrassBro/OCOR.
△ Less
Submitted 22 March, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
An Interpretable Neuro-Symbolic Reasoning Framework for Task-Oriented Dialogue Generation
Authors:
Shiquan Yang,
Rui Zhang,
Sarah Erfani,
Jey Han Lau
Abstract:
We study the interpretability issue of task-oriented dialogue systems in this paper. Previously, most neural-based task-oriented dialogue systems employ an implicit reasoning strategy that makes the model predictions uninterpretable to humans. To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains. Sin…
▽ More
We study the interpretability issue of task-oriented dialogue systems in this paper. Previously, most neural-based task-oriented dialogue systems employ an implicit reasoning strategy that makes the model predictions uninterpretable to humans. To obtain a transparent reasoning process, we introduce neuro-symbolic to perform explicit reasoning that justifies model decisions by reasoning chains. Since deriving reasoning chains requires multi-hop reasoning for task-oriented dialogues, existing neuro-symbolic approaches would induce error propagation due to the one-phase design. To overcome this, we propose a two-phase approach that consists of a hypothesis generator and a reasoner. We first obtain multiple hypotheses, i.e., potential operations to perform the desired task, through the hypothesis generator. Each hypothesis is then verified by the reasoner, and the valid one is selected to conduct the final prediction. The whole system is trained by exploiting raw textual dialogues without using any reasoning chain annotations. Experimental studies on two public benchmark datasets demonstrate that the proposed approach not only achieves better results, but also introduces an interpretable decision process.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
PeerSum: A Peer Review Dataset for Abstractive Multi-document Summarization
Authors:
Miao Li,
Jianzhong Qi,
Jey Han Lau
Abstract:
We present PeerSum, a new MDS dataset using peer reviews of scientific publications. Our dataset differs from the existing MDS datasets in that our summaries (i.e., the meta-reviews) are highly abstractive and they are real summaries of the source documents (i.e., the reviews) and it also features disagreements among source documents. We found that current state-of-the-art MDS models struggle to g…
▽ More
We present PeerSum, a new MDS dataset using peer reviews of scientific publications. Our dataset differs from the existing MDS datasets in that our summaries (i.e., the meta-reviews) are highly abstractive and they are real summaries of the source documents (i.e., the reviews) and it also features disagreements among source documents. We found that current state-of-the-art MDS models struggle to generate high-quality summaries for PeerSum, offering new research opportunities.
△ Less
Submitted 28 September, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Asymmetry-Based Quantum Backaction Suppression in Quadratic Optomechanics
Authors:
Vincent Dumont,
Hoi-Kwan Lau,
Aashish A. Clerk,
Jack C. Sankey
Abstract:
As the field of optomechanics advances, quadratic dispersive coupling (QDC) promise an increasingly feasible class of qualitatively new functionality. However, the leading QDC geometries also generate linear dissipative coupling, and an associated quantum radiation force noise that is detrimental to QDC applications. Here, we propose a simple modification that dramatically reduces this noise witho…
▽ More
As the field of optomechanics advances, quadratic dispersive coupling (QDC) promise an increasingly feasible class of qualitatively new functionality. However, the leading QDC geometries also generate linear dissipative coupling, and an associated quantum radiation force noise that is detrimental to QDC applications. Here, we propose a simple modification that dramatically reduces this noise without altering the QDC strength. We identify optimal regimes of operation, and discuss advantages within the examples of optical levitation and nondestructive phonon measurement.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
ITTC @ TREC 2021 Clinical Trials Track
Authors:
Thinh Hung Truong,
Yulia Otmakhova,
Rahmad Mahendra,
Timothy Baldwin,
Jey Han Lau,
Trevor Cohn,
Lawrence Cavedon,
Damiano Spina,
Karin Verspoor
Abstract:
This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track. The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes. We explor…
▽ More
This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track. The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes. We explore different ways of representing trials and topics using NLP techniques, and then use a common retrieval model to generate the ranked list of relevant trials for each topic. The results from all our submitted runs are well above the median scores for all topics, but there is still plenty of scope for improvement.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Inhomogeneous cancellation conditions and Calderón-Zygmund type operators on $h^p$
Authors:
Galia Dafni,
Chun Ho Lau,
Tiago Picon,
Claudio Vasconcelos
Abstract:
In this work we present a new approach to molecules on Goldberg's local Hardy spaces $h^p(\mathbb{R}^n)$, $0<p\leq1$, assuming an appropriate cancellation condition. As applications, we prove a version of Hardy's inequality and improved continuity results for inhomogeneous Calderón-Zygmund operators on these spaces.
In this work we present a new approach to molecules on Goldberg's local Hardy spaces $h^p(\mathbb{R}^n)$, $0<p\leq1$, assuming an appropriate cancellation condition. As applications, we prove a version of Hardy's inequality and improved continuity results for inhomogeneous Calderón-Zygmund operators on these spaces.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Findings on Conversation Disentanglement
Authors:
Rongxin Zhu,
Jey Han Lau,
Jianzhong Qi
Abstract:
Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization. Framing it as a utterance-to-utterance classification problem -- i.e. given an utterance of interest (UOI), find which past utterance it replies to -- we exp…
▽ More
Conversation disentanglement, the task to identify separate threads in conversations, is an important pre-processing step in multi-party conversational NLP applications such as conversational question answering and conversation summarization. Framing it as a utterance-to-utterance classification problem -- i.e. given an utterance of interest (UOI), find which past utterance it replies to -- we explore a number of transformer-based models and found that BERT in combination with handcrafted features remains a strong baseline. We then build a multi-task learning model that jointly learns utterance-to-utterance and utterance-to-thread classification. Observing that the ground truth label (past utterance) is in the top candidates when our model makes an error, we experiment with using bipartite graphs as a post-processing step to learn how to best match a set of UOIs to past utterances. Experiments on the Ubuntu IRC dataset show that this approach has the potential to outperform the conventional greedy approach of simply selecting the highest probability candidate for each UOI independently, indicating a promising future research direction.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
ICARUS-Q: Integrated Control and Readout Unit for Scalable Quantum Processors
Authors:
Kun Hee Park,
Yung Szen Yap,
Yuanzheng Paul Tan,
Christoph Hufnagel,
Long Hoang Nguyen,
Karn Hwa Lau,
Patrick Bore,
Stavros Efthymiou,
Stefano Carrazza,
Rangga P. Budoyo,
Rainer Dumke
Abstract:
We present a control and measurement setup for superconducting qubits based on Xilinx 16-channel radio-frequency system-on-chip (RFSoC) device. The proposed setup consists of four parts: multiple RFSoC boards, a setup to synchronise every digital to analog converter (DAC), and analog to digital converter (ADC) channel across multiple boards, a low-noise direct current (DC) supply for tuning the qu…
▽ More
We present a control and measurement setup for superconducting qubits based on Xilinx 16-channel radio-frequency system-on-chip (RFSoC) device. The proposed setup consists of four parts: multiple RFSoC boards, a setup to synchronise every digital to analog converter (DAC), and analog to digital converter (ADC) channel across multiple boards, a low-noise direct current (DC) supply for tuning the qubit frequency and cloud access for remotely performing experiments. We also design the setup to be free of physical mixers. The RFSoC boards directly generate microwave pulses using sixteen DAC channels up to the third Nyquist zone which are directly sampled by its eight ADC channels between the fifth and the ninth zones.
△ Less
Submitted 1 September, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
Geometry-aware Two-scale PIFu Representation for Human Reconstruction
Authors:
Zheng Dong,
Ke Xu,
Ziheng Duan,
Hujun Bao,
Weiwei Xu,
Rynson W. H. Lau
Abstract:
Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction fr…
▽ More
Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction from sparse, noisy inputs. Our key idea is to exploit the complementary properties of depth denoising and 3D reconstruction, for learning a two-scale PIFu representation to reconstruct high-frequency facial details and consistent bodies separately. To this end, we first formulate depth denoising and 3D reconstruction as a multi-task learning problem. The depth denoising process enriches the local geometry information of the reconstruction features, while the reconstruction process enhances depth denoising with global topology information. We then propose to learn the two-scale PIFu representation using two MLPs based on the denoised depth and geometry-aware features. Extensive experiments demonstrate the effectiveness of our approach in reconstructing facial details and bodies of different poses and its superiority over state-of-the-art methods.
△ Less
Submitted 27 September, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Dissipative superradiant spin amplifier for enhanced quantum sensing
Authors:
Martin Koppenhöfer,
Peter Groszkowski,
Hoi-Kwan Lau,
A. A. Clerk
Abstract:
Quantum metrology protocols exploiting ensembles of $N$ two-level systems and Ramsey-style measurements are ubiquitous. However, in many cases excess readout noise severely degrades the measurement sensitivity; in particular in sensors based on ensembles of solid-state defect spins. We present a dissipative "spin amplification" protocol that allows one to dramatically improve the sensitivity of su…
▽ More
Quantum metrology protocols exploiting ensembles of $N$ two-level systems and Ramsey-style measurements are ubiquitous. However, in many cases excess readout noise severely degrades the measurement sensitivity; in particular in sensors based on ensembles of solid-state defect spins. We present a dissipative "spin amplification" protocol that allows one to dramatically improve the sensitivity of such schemes, even in the presence of realistic intrinsic dissipation and noise. Our method is based on exploiting collective (i.e., superradiant) spin decay, an effect that is usually seen as a nuisance because it limits spin-squeezing protocols. We show that our approach can allow a system with a highly imperfect spin readout to approach SQL-like scaling in $N$ within a factor of two, without needing to change the actual readout mechanism. Our ideas are compatible with several state-of-the-art experimental platforms where an ensemble of solid-state spins (NV centers, SiV centers) is coupled to a common microwave or mechanical mode.
△ Less
Submitted 3 December, 2022; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Learning to Detect Instance-level Salient Objects Using Complementary Image Labels
Authors:
Xin Tian,
Ke Xu,
Xin Yang,
Baocai Yin,
Rynson W. H. Lau
Abstract:
Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-a…
▽ More
Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-aware saliency information, as salient instances with high semantic affinities may not be easily separated by the labels. As the subitizing information provides an instant judgement on the number of salient items, it is naturally related to detecting salient instances and may help separate instances of the same class while grou** different parts of the same instance. Inspired by this observation, we propose to use class and subitizing labels as weak supervision for the SID problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids. This complementary information is then fused to produce a salient instance map. To facilitate the learning process, we further propose a progressive training scheme to reduce label noise and the corresponding noise learned by the model, via reciprocating the model with progressive salient instance prediction and model refreshing. Our extensive evaluations show that the proposed method plays favorably against carefully designed baseline methods adapted from related tasks.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
Exploring Story Generation with Multi-task Objectives in Variational Autoencoders
Authors:
Zhuohan Xie,
Trevor Cohn,
Jey Han Lau
Abstract:
GPT-2 has been frequently adapted in story generation models as it provides powerful generative capability. However, it still fails to generate consistent stories and lacks diversity. Current story generation models leverage additional information such as plots or commonsense into GPT-2 to guide the generation process. These approaches focus on improving generation quality of stories while our wor…
▽ More
GPT-2 has been frequently adapted in story generation models as it provides powerful generative capability. However, it still fails to generate consistent stories and lacks diversity. Current story generation models leverage additional information such as plots or commonsense into GPT-2 to guide the generation process. These approaches focus on improving generation quality of stories while our work look at both quality and diversity. We explore combining BERT and GPT-2 to build a variational autoencoder (VAE), and extend it by adding additional objectives to learn global features such as story topic and discourse relations. Our evaluations show our enhanced VAE can provide better quality and diversity trade off, generate less repetitive story content and learn a more informative latent variable.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.