Search | arXiv e-print repository

Rejection-Sampled Universal Quantization for Smaller Quantization Errors

Abstract: We construct a randomized vector quantizer which has a smaller maximum error compared to all known lattice quantizers with the same entropy for dimensions 5, 6, ..., 48, and also has a smaller mean squared error compared to known lattice quantizers with the same entropy for dimensions 35, ..., 48, in the high resolution limit. Moreover, our randomized quantizer has a desirable property that the qu… ▽ More We construct a randomized vector quantizer which has a smaller maximum error compared to all known lattice quantizers with the same entropy for dimensions 5, 6, ..., 48, and also has a smaller mean squared error compared to known lattice quantizers with the same entropy for dimensions 35, ..., 48, in the high resolution limit. Moreover, our randomized quantizer has a desirable property that the quantization error is always uniform over the ball and independent of the input. Our construction is based on applying rejection sampling on universal quantization, which allows us to shape the error distribution to be any continuous distribution, not only uniform distributions over basic cells of a lattice as in conventional dithered quantization. We also characterize the high SNR limit of one-shot channel simulation for any additive noise channel under a mild assumption (e.g., the AWGN channel), up to an additive constant of 1.45 bits. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 15 pages, 2 figures

arXiv:2309.06982 [pdf, other]

Communication-Efficient Laplace Mechanism for Differential Privacy via Random Quantization

Authors: Ali Moradi Shahmiri, Chih Wei Ling, Cheuk Ting Li

Abstract: We propose the first method that realizes the Laplace mechanism exactly (i.e., a Laplace noise is added to the data) that requires only a finite amount of communication (whereas the original Laplace mechanism requires the transmission of a real number) while guaranteeing privacy against the server and database. Our mechanism can serve as a drop-in replacement for local or centralized differential… ▽ More We propose the first method that realizes the Laplace mechanism exactly (i.e., a Laplace noise is added to the data) that requires only a finite amount of communication (whereas the original Laplace mechanism requires the transmission of a real number) while guaranteeing privacy against the server and database. Our mechanism can serve as a drop-in replacement for local or centralized differential privacy applications where the Laplace mechanism is used. Our mechanism is constructed using a random quantization technique. Unlike the simple and prevalent Laplace-mechanism-then-quantize approach, the quantization in our mechanism does not result in any distortion or degradation of utility. Unlike existing dithered quantization and channel simulation schemes for simulating additive Laplacian noise, our mechanism guarantees privacy not only against the database and downstream, but also against the honest but curious server which attempts to decode the data using the dither signals. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: 11 pages, 3 figures, short version to be submitted at 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2308.02247 [pdf]

doi 10.1016/j.cemconres.2023.107299

Mechanisms and kinetics of C-S-H nucleation approaching the spinodal line: Insights into the role of organics additives

Authors: Christophe Labbez, Lina Bouzouaid, Alexander E. S. Van Driessche, Wai Li Ling, Juan Carlos Martinez, Barbara Lothenbach, Alejandro Fernandez-Martinez

Abstract: Wet chemistry C-S-H precipitation experiments were performed under controlled conditions of solution supersaturation in the presence and absence of gluconate and three hexitol molecules. Characterization of the precipitates with SAXS and cryo-TEM experiments confirmed the presence of a multi-step nucleation pathway. Induction times for the formation of the amorphous C-S-H spheroids were determined… ▽ More Wet chemistry C-S-H precipitation experiments were performed under controlled conditions of solution supersaturation in the presence and absence of gluconate and three hexitol molecules. Characterization of the precipitates with SAXS and cryo-TEM experiments confirmed the presence of a multi-step nucleation pathway. Induction times for the formation of the amorphous C-S-H spheroids were determined from light transmittance. Analysis of those data with the classical nucleation theory revealed a significant increase of the kinetic prefactor in the same order as the complexation constants of calcium and silicate with each of the organics. Finally, two distinct precipitation regimes of the C-S-H amorphous precursor were identified: i) a nucleation regime at low saturation indexes (SI) and ii) a spinodal nucleation regime at high SI where the free energy barrier to the phase transition is found to be of the order of the kinetic energy or less. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Accepted in Cement and Concrete Research. 30 pages plus supplementary materials. arXiv admin note: substantial text overlap with arXiv:2111.02743

Journal ref: Cement and Concrete Research, 173, 2023, 107299

arXiv:2305.07283 [pdf, other]

doi 10.1109/TCSVT.2022.3223150

Quaternion-valued Correlation Learning for Few-Shot Semantic Segmentation

Authors: Zewen Zheng, Guoheng Huang, Xiaochen Yuan, Chi-Man Pun, Hongrui Liu, Wing-Kuen Ling

Abstract: Few-shot segmentation (FSS) aims to segment unseen classes given only a few annotated samples. Encouraging progress has been made for FSS by leveraging semantic features learned from base classes with sufficient training samples to represent novel classes. The correlation-based methods lack the ability to consider interaction of the two subspace matching scores due to the inherent nature of the re… ▽ More Few-shot segmentation (FSS) aims to segment unseen classes given only a few annotated samples. Encouraging progress has been made for FSS by leveraging semantic features learned from base classes with sufficient training samples to represent novel classes. The correlation-based methods lack the ability to consider interaction of the two subspace matching scores due to the inherent nature of the real-valued 2D convolutions. In this paper, we introduce a quaternion perspective on correlation learning and propose a novel Quaternion-valued Correlation Learning Network (QCLNet), with the aim to alleviate the computational burden of high-dimensional correlation tensor and explore internal latent interaction between query and support images by leveraging operations defined by the established quaternion algebra. Specifically, our QCLNet is formulated as a hyper-complex valued network and represents correlation tensors in the quaternion domain, which uses quaternion-valued convolution to explore the external relations of query subspace when considering the hidden relationship of the support sub-dimension in the quaternion space. Extensive experiments on the PASCAL-5i and COCO-20i datasets demonstrate that our method outperforms the existing state-of-the-art methods effectively. Our code is available at https://github.com/zwzheng98/QCLNet and our article "Quaternion-valued Correlation Learning for Few-Shot Semantic Segmentation" was published in IEEE Transactions on Circuits and Systems for Video Technology, vol. 33,no.5,pp.2102-2115,May 2023,doi: 10.1109/TCSVT.2022.3223150. △ Less

Submitted 30 August, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: for associated paper file, see https://ieeexplore.ieee.org/document/9954424?source=authoralert

arXiv:2305.06788 [pdf, other]

Vector Quantization with Error Uniformly Distributed over an Arbitrary Set

Authors: Chih Wei Ling, Cheuk Ting Li

Abstract: For uniform scalar quantization, the error distribution is approximately a uniform distribution over an interval (which is also a 1-dimensional ball). Nevertheless, for lattice vector quantization, the error distribution is uniform not over a ball, but over the basic cell of the quantization lattice. In this paper, we construct vector quantizers with periodic properties, where the error is uniform… ▽ More For uniform scalar quantization, the error distribution is approximately a uniform distribution over an interval (which is also a 1-dimensional ball). Nevertheless, for lattice vector quantization, the error distribution is uniform not over a ball, but over the basic cell of the quantization lattice. In this paper, we construct vector quantizers with periodic properties, where the error is uniformly distributed over the n-ball, or any other prescribed set. We then prove upper and lower bounds on the entropy of the quantized signals. We also discuss how our construction can be applied to give a randomized quantization scheme with a nonuniform error distribution. △ Less

Submitted 24 January, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: 22 pages, 3 figures. Short version presented at 2023 IEEE International Symposium on Information Theory

arXiv:2303.15885 [pdf]

Multiple-level Green Noise Mask Design for Practical Fourier Phase Retrieval

Authors: Qiuliang Ye, Bingo Wing-Kuen Ling, Li-Wen Wang, Daniel Pak-Kong Lun

Abstract: Phase retrieval, a long-established challenge for recovering a complex-valued signal from its Fourier intensity measurements, has attracted significant interest because of its far-flung applications in optical imaging. To enhance accuracy, researchers introduce extra constraints to the measuring procedure by including a random aperture mask in the optical path that randomly modulates the light pro… ▽ More Phase retrieval, a long-established challenge for recovering a complex-valued signal from its Fourier intensity measurements, has attracted significant interest because of its far-flung applications in optical imaging. To enhance accuracy, researchers introduce extra constraints to the measuring procedure by including a random aperture mask in the optical path that randomly modulates the light projected on the target object and gives the coded diffraction patterns (CDP). It is known that random masks are non-bandlimited and can lead to considerable high-frequency components in the Fourier intensity measurements. These high-frequency components can be beyond the Nyquist frequency of the optical system and are thus ignored by the phase retrieval optimization algorithms, resulting in degraded reconstruction performances. Recently, our team developed a binary green noise masking scheme that can significantly reduce the high-frequency components in the measurement. However, the scheme cannot be extended to generate multiple-level aperture masks. This paper proposes a two-stage optimization algorithm to generate multi-level random masks named $\textit{OptMask}$ that can also significantly reduce high-frequency components in the measurements but achieve higher accuracy than the binary masking scheme. Extensive experiments on a practical optical platform were conducted. The results demonstrate the superiority and practicality of the proposed $\textit{OptMask}$ over the existing masking schemes for CDP phase retrieval. △ Less

Submitted 14 May, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2207.08583 [pdf, other]

MAD for Robust Reinforcement Learning in Machine Translation

Authors: Domenic Donato, Lei Yu, Wang Ling, Chris Dyer

Abstract: We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviatio… ▽ More We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2207.01152 [pdf, other]

doi 10.1109/JLT.2022.3204101

Geometrically-Shaped Multi-Dimensional Modulation Formats in Coherent Optical Transmission Systems

Authors: Bin Chen, Yi Lei, Gabriele Liga, Zhiwei Liang, Wei Ling, Xuwei Xue, Alex Alvarado

Abstract: Sha** modulation formats in multi-dimensional (MD) space is an effective approach to harvest spectral efficiency gains in both the additive white Gaussian noise (AWGN) channel and the optical fiber channel. In the first part of this paper, existing MD geometrically-shaped modulations for fiber optical communications are reviewed. It is shown that large gains can be obtained by exploiting correla… ▽ More Sha** modulation formats in multi-dimensional (MD) space is an effective approach to harvest spectral efficiency gains in both the additive white Gaussian noise (AWGN) channel and the optical fiber channel. In the first part of this paper, existing MD geometrically-shaped modulations for fiber optical communications are reviewed. It is shown that large gains can be obtained by exploiting correlation in the dimensions or/and by increasing the cardinality of the modulation format. Practical limitations and challenges are also discussed together with efficient solutions. In the second part, we extend the recently proposed four-dimensional (4D) modulation format family based on the constraint of orthant-symmetry to high spectrum efficiencies up to 10 bit/4D-sym by maximizing generalized mutual information for AWGN channel. Reach increases of up to 25% for a multi-span optical fiber transmission system are reported. Lastly,with the help of a recently introduced nonlinear interference (NLI) model, an optimization for designing nonlinear-tolerant 4D modulation formats is introduced for a single-span optical fiber system. Simulation results show that the proposed NLI model-based 4D modulation format could increase the effective SNRs by 0.25 dB with respect to the AWGN channel-optimal 4D modulation format. △ Less

Submitted 31 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: 14 pages, 10 figures, accepted by JLT

arXiv:2202.11444 [pdf, other]

Enabling arbitrary translation objectives with Adaptive Tree Search

Authors: Wang Ling, Wojciech Stokowiec, Domenic Donato, Laurent Sartran, Lei Yu, Austin Matthews, Chris Dyer

Abstract: We introduce an adaptive tree search algorithm, that can find high-scoring outputs under translation models that make no assumptions about the form or structure of the search objective. This algorithm -- a deterministic variant of Monte Carlo tree search -- enables the exploration of new kinds of models that are unencumbered by constraints imposed to make decoding tractable, such as autoregressivi… ▽ More We introduce an adaptive tree search algorithm, that can find high-scoring outputs under translation models that make no assumptions about the form or structure of the search objective. This algorithm -- a deterministic variant of Monte Carlo tree search -- enables the exploration of new kinds of models that are unencumbered by constraints imposed to make decoding tractable, such as autoregressivity or conditional independence assumptions. When applied to autoregressive models, our algorithm has different biases than beam search has, which enables a new analysis of the role of decoding bias in autoregressive models. Empirically, we show that our adaptive tree search algorithm finds outputs with substantially better model scores compared to beam search in autoregressive models, and compared to reranking techniques in models whose scores do not decompose additively with respect to the words in the output. We also characterise the correlation of several translation model objectives with respect to BLEU. We find that while some standard models are poorly calibrated and benefit from the beam search bias, other often more robust models (autoregressive models tuned to maximize expected automatic metric scores, the noisy channel model and a newly proposed objective) benefit from increasing amounts of search using our proposed decoder, whereas the beam search bias limits the improvements obtained from such objectives. Thus, we argue that as models improve, the improvements may be masked by over-reliance on beam search or reranking based methods. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: 17 pages, 3 figures

arXiv:2201.10171 [pdf, other]

Weighted Parity-Check Codes for Channels with State and Asymmetric Channels

Authors: Chih Wei Ling, Yanxiao Liu, Cheuk Ting Li

Abstract: In this paper, we introduce a new class of codes, called weighted parity-check codes, where each parity-check bit has a weight that indicates its likelihood to be one (instead of fixing each parity-check bit to be zero). It is applicable to a wide range of settings, e.g. asymmetric channels, channels with state and/or cost constraints, and the Wyner-Ziv problem, and can provably achieve the capaci… ▽ More In this paper, we introduce a new class of codes, called weighted parity-check codes, where each parity-check bit has a weight that indicates its likelihood to be one (instead of fixing each parity-check bit to be zero). It is applicable to a wide range of settings, e.g. asymmetric channels, channels with state and/or cost constraints, and the Wyner-Ziv problem, and can provably achieve the capacity. For the channels with state (Gelfand-Pinsker) setting, the proposed coding scheme has two advantages compared to the nested linear code. First, it achieves the capacity of any channel with state (e.g. asymmetric channels). Second, simulation results show that the proposed code achieves a smaller error rate compared to the nested linear code. We also discuss a sparse construction where the belief propagation algorithm can be applied to improve the coding efficiency. △ Less

Submitted 30 May, 2023; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: 17 pages, 4 figure. This is the full version of a paper presented at 2022 IEEE International Symposium on Information Theory (ISIT)

arXiv:2112.12377 [pdf, other]

Shaped Four-Dimensional Modulation Formats for Optical Fiber Communication Systems

Authors: Bin Chen, Gabriele Liga, Yi Lei, Wei Ling, Zhengyan Huan, Xuwei Xue, Alex Alvarado

Abstract: We review the design of multidimensional modulations by maximizing generalized mutual information and compare the maximum transmission reach of recently introduced 4D formats. A model-based optimization for nonlinear-tolerant 4D modulations is also discussed. We review the design of multidimensional modulations by maximizing generalized mutual information and compare the maximum transmission reach of recently introduced 4D formats. A model-based optimization for nonlinear-tolerant 4D modulations is also discussed. △ Less

Submitted 23 December, 2021; originally announced December 2021.

Comments: OFC2022 invited paper

arXiv:2111.02743 [pdf]

Impact of gluconate and hexitol additives on the precipitation mechanism and kinetics of C-S-H

Authors: Lina Bouzouaid, Alexander E. S. Van Driessche, Wai Li Ling, Juan Carlos Martinez, Marc Malfois, Barbara Lothenbach, Christophe Labbez, Alejandro Fernandez-Martinez

Abstract: The present paper investigates the influence of gluconate and hexitol additives on the precipitation mechanism and kinetics of C-S-H. To this end, wet chemistry C-S-H precipitation experiments were performed under controlled conditions of solution supersaturation, under varying silicate concentration, while the transmittance of the solution was followed. This allowed determining induction times fo… ▽ More The present paper investigates the influence of gluconate and hexitol additives on the precipitation mechanism and kinetics of C-S-H. To this end, wet chemistry C-S-H precipitation experiments were performed under controlled conditions of solution supersaturation, under varying silicate concentration, while the transmittance of the solution was followed. This allowed determining induction times for the formation of C-S-H precursors in the presence and absence of gluconate and three hexitol molecules. Characterization of the precipitates was performed via small angle X-ray scattering and cryo-transmission electron microscopy experiments, which allowed the identification of a multi-step nucleation pathway also in the presence of the organics. Analysis of the induction time data in the framework of the classical nucleation theory revealed a significant increase of the kinetic pre-factor, which is associated to physical rates of cluster collision and aggregation during the nucleation process. Values of the kinetic pre-factor increase in the same order as the complexation constants of calcium and silicate with the each of the organics. This points to a retarding mechanism of crystallization related to steric hindrance of the aggregation of the early formed clusters via adsorption of the organics at their surfaces. △ Less

Submitted 9 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.15560 [pdf, other]

doi 10.1109/LPT.2021.3125017

Low-Complexity Geometrical Sha** for 4D Modulation Formats via Amplitude Coding

Authors: Bin Chen, Wei Ling, Yunus Can Gültekin, Yi Lei, Chigo Okonkwo, Alex Alvarado

Abstract: Signal sha** is vital to approach Shannon's capacity, yet it is challenging to implement at very high speeds. For example, probabilistic sha** often requires arithmetic coding to realize the target distribution. Geometric sha** requires look-up tables to store the constellation points. In this paper, we propose a four-dimensional amplitude coding (4D-AC) geometrical shaper architecture. The… ▽ More Signal sha** is vital to approach Shannon's capacity, yet it is challenging to implement at very high speeds. For example, probabilistic sha** often requires arithmetic coding to realize the target distribution. Geometric sha** requires look-up tables to store the constellation points. In this paper, we propose a four-dimensional amplitude coding (4D-AC) geometrical shaper architecture. The proposed architecture can generate in real time geometrically shaped 4D formats via simple logic circuit operations and two conventional quadrature amplitude modulation (QAM) modulators. This paper describes the 4D-AC used in generating approximated versions of two recently proposed 4D orthant symmetric modulation formats with spectral efficiencies of 6 bit/4D-sym and 7 bit/4D-sym, respectively. Numerical results show losses below 0.05 dB when compared against the baseline formats. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Comments: 4 pages, 5 figures, Accepted by IEEE Photonics Technology Letter

arXiv:2109.05732 [pdf, other]

Proximal Linearized Method for Sparse Equity Portfolio Optimization with Minimum Transaction Cost

Authors: Hong Seng Sim, Wendy Shin Yie Ling, Wah June Leong, Chuei Yee Chen

Abstract: In this paper, we propose a sparse equity portfolio optimization (SEPO) based on the mean-variance portfolio selection model. Aimed at minimizing transaction cost by avoiding small investments, this new model includes $\ell_0$-norm regularization of the asset weights to promote sparsity, hence the acronym SEPO-$\ell_0$. The selection model is also subjected to a minimum expected return. The comple… ▽ More In this paper, we propose a sparse equity portfolio optimization (SEPO) based on the mean-variance portfolio selection model. Aimed at minimizing transaction cost by avoiding small investments, this new model includes $\ell_0$-norm regularization of the asset weights to promote sparsity, hence the acronym SEPO-$\ell_0$. The selection model is also subjected to a minimum expected return. The complexity of the model calls for proximal method, which allows us to handle the objectives terms separately via the corresponding proximal operators. We develop an efficient ADMM-like algorithm to find the optimal portfolio and prove its global convergence. The efficiency of the algorithm is demonstrated using real stock data and the model is promising in portfolio selection in terms of generating higher expected return while maintaining good level of sparsity, and thus minimizing transaction cost. △ Less

Submitted 13 September, 2021; originally announced September 2021.

MSC Class: 90C90; 90C26; 91G10

arXiv:2011.10163 [pdf, other]

A Unified Model of Feature Extraction and Clustering for Spike Sorting

Authors: Libo Huang, Lu Gan, Bingo Wing-Kuen Ling

Abstract: Spike sorting plays an irreplaceable role in understanding brain codes. Traditional spike sorting technologies perform feature extraction and clustering separately after spikes are well detected. However, it may often cause many additional processes and further lead to low-accurate and/or unstable results especially when there are noises and/or overlap** spikes in datasets. To address these issu… ▽ More Spike sorting plays an irreplaceable role in understanding brain codes. Traditional spike sorting technologies perform feature extraction and clustering separately after spikes are well detected. However, it may often cause many additional processes and further lead to low-accurate and/or unstable results especially when there are noises and/or overlap** spikes in datasets. To address these issues, in this paper, we proposed a unified optimisation model integrating feature extraction and clustering for spike sorting. Interestingly, instead of the widely used combination strategies, i.e., performing the principal component analysis (PCA) for spike feature extraction and K-means (KM) for clustering in sequence, we unified PCA and KM into one optimisation model, which reduces additional processes with fewer iteration times. Subsequently, by embedding the K-means++ strategy for initialising and a comparison updating rule in the solving process, the proposed model can well handle the noises and/or overlap** interference. Finally, taking the best of the clustering validity indices into the proposed model, we derive an automatic spike sorting method. Plenty of experimental results on both synthetic and real-world datasets confirm that our proposed method outperforms the related state-of-the-art approaches. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: 10 pages, 5 figures

arXiv:1910.08350 [pdf, other]

A Mutual Information Maximization Perspective of Language Representation Learning

Authors: Lingpeng Kong, Cyprien de Masson d'Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama

Abstract: We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing ou… ▽ More We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing). △ Less

Submitted 26 November, 2019; v1 submitted 18 October, 2019; originally announced October 2019.

Comments: 12 pages, 3 figures

arXiv:1910.00553 [pdf, other]

Better Document-Level Machine Translation with Bayes' Rule

Authors: Lei Yu, Laurent Sartran, Wojciech Stokowiec, Wang Ling, Lingpeng Kong, Phil Blunsom, Chris Dyer

Abstract: We show that Bayes' rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents---a compelling benefit as parallel documents are not always available. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output doc… ▽ More We show that Bayes' rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents---a compelling benefit as parallel documents are not always available. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output document and the "reverse translation probability" of translating the candidate output back into the source language. Our proposed model uses a powerful autoregressive language model as the prior on target language documents, but it assumes that each sentence is translated independently from the target to the source language. Crucially, at test time, when a source document is observed, the document language model prior induces dependencies between the translations of the source sentences in the posterior. The model's independence assumption not only enables efficient use of available data, but it additionally admits a practical left-to-right beam-search algorithm for carrying out inference. Experiments show that our model benefits from using cross-sentence context in the language model, and it outperforms existing document translation approaches. △ Less

Submitted 2 July, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: Accepted by TACL

arXiv:1908.00673 [pdf, other]

Hybrid Low-order and Higher-order Graph Convolutional Networks

Authors: FangYuan Lei, Xun Liu, QingYun Dai, Bingo Wing-Kuen Ling, Huimin Zhao, Yan Liu

Abstract: With higher-order neighborhood information of graph network, the accuracy of graph representation learning classification can be significantly improved. However, the current higher order graph convolutional network has a large number of parameters and high computational complexity. Therefore, we propose a Hybrid Lower order and Higher order Graph convolutional networks (HLHG) learning model, which… ▽ More With higher-order neighborhood information of graph network, the accuracy of graph representation learning classification can be significantly improved. However, the current higher order graph convolutional network has a large number of parameters and high computational complexity. Therefore, we propose a Hybrid Lower order and Higher order Graph convolutional networks (HLHG) learning model, which uses weight sharing mechanism to reduce the number of network parameters. To reduce computational complexity, we propose a novel fusion pooling layer to combine the neighborhood information of high order and low order. Theoretically, we compare the model complexity of the proposed model with the other state-of-the-art model. Experimentally, we verify the proposed model on the large-scale text network datasets by supervised learning, and on the citation network datasets by semi-supervised learning. The experimental results show that the proposed model achieves highest classification accuracy with a small set of trainable weight parameters. △ Less

Submitted 1 August, 2019; originally announced August 2019.

arXiv:1901.11373 [pdf, other]

Learning and Evaluating General Linguistic Intelligence

Authors: Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

Abstract: We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of ex… ▽ More We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process. In addition to task performance, we propose a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Our results show that while the field has made impressive progress in terms of model architectures that generalize to many tasks, these models still require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. Moreover, we find that far from solving general tasks (e.g., document question answering), our models are overfitting to the quirks of particular datasets (e.g., SQuAD). We discuss missing components and conjecture on how to make progress toward general linguistic intelligence. △ Less

Submitted 31 January, 2019; originally announced January 2019.

arXiv:1901.09296 [pdf, other]

Variational Smoothing in Recurrent Neural Network Language Models

Authors: Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama

Abstract: We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017). We show that each variant of data noising is an instance of Bayesian recurrent neural networks with a particular variational distribution (i.e., a mixture of Gaussians whose weights depend on statistics derived from the corpus such as the unigram distribution). We use this insig… ▽ More We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017). We show that each variant of data noising is an instance of Bayesian recurrent neural networks with a particular variational distribution (i.e., a mixture of Gaussians whose weights depend on statistics derived from the corpus such as the unigram distribution). We use this insight to propose a more principled method to apply at prediction time and propose natural extensions to data noising under the variational framework. In particular, we propose variational smoothing with tied input and output embedding matrices and an element-wise variational smoothing method. We empirically verify our analysis on two benchmark language modeling datasets and demonstrate performance improvements over existing data noising methods. △ Less

Submitted 26 January, 2019; originally announced January 2019.

Comments: Accepted as a conference paper at ICLR 2019

arXiv:1811.10475 [pdf, other]

Sentence Encoding with Tree-constrained Relation Networks

Authors: Lei Yu, Cyprien de Masson d'Autume, Chris Dyer, Phil Blunsom, Lingpeng Kong, Wang Ling

Abstract: The meaning of a sentence is a function of the relations that hold between its words. We instantiate this relational view of semantics in a series of neural models based on variants of relation networks (RNs) which represent a set of objects (for us, words forming a sentence) in terms of representations of pairs of objects. We propose two extensions to the basic RN model for natural language. Firs… ▽ More The meaning of a sentence is a function of the relations that hold between its words. We instantiate this relational view of semantics in a series of neural models based on variants of relation networks (RNs) which represent a set of objects (for us, words forming a sentence) in terms of representations of pairs of objects. We propose two extensions to the basic RN model for natural language. First, building on the intuition that not all word pairs are equally informative about the meaning of a sentence, we use constraints based on both supervised and unsupervised dependency syntax to control which relations influence the representation. Second, since higher-order relations are poorly captured by a sum of pairwise relations, we use a recurrent extension of RNs to propagate information so as to form representations of higher order relations. Experiments on sentence classification, sentence pair classification, and machine translation reveal that, while basic RNs are only modestly effective for sentence representation, recurrent RNs with latent syntax are a reliably powerful representational device. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: 12 pages

arXiv:1807.08586 [pdf, ps, other]

A Queuing Model for CPU Functional Unit and Issue Queue Configuration

Authors: Shane Carroll, Wei-Ming Ling

Abstract: In a superscalar processor, instructions of various types flow through an execution pipeline, traversing hardware resources which are mostly shared among many different instruction types. A notable exception to shared pipeline resources is the collection of functional units, the hardware that performs specific computations. In a trade-off of cost versus performance, a pipeline designer must decide… ▽ More In a superscalar processor, instructions of various types flow through an execution pipeline, traversing hardware resources which are mostly shared among many different instruction types. A notable exception to shared pipeline resources is the collection of functional units, the hardware that performs specific computations. In a trade-off of cost versus performance, a pipeline designer must decide how many of each type of functional unit to place in a processor's pipeline. In this paper, we model a superscalar processor's issue queue and functional units as a novel queuing network. We treat the issue queue as a finite-sized waiting area and the functional units as servers. In addition to common queuing problems, customers of the network share the queue but wait for specific servers to become ready (e.g., addition instructions wait for adders). Furthermore, the customers in this queue are not necessary ready for service, since instructions may be waiting for operands. In this paper we model a novel queuing network that provides a solution to the expected queue length of each type of instruction. This network and its solution can also be generalized to other problems, notably other resource-allocation issues that arise in superscalar pipelines. △ Less

Submitted 19 July, 2018; originally announced July 2018.

arXiv:1712.04252 [pdf]

doi 10.1103/PhysRevB.98.014110

Topological interface modes in local resonant acoustic systems

Authors: Degang Zhao, Meng Xiao, C. W. Ling, C. T. Chan, Kin Hung Fung

Abstract: Topological phononic crystals (PCs) are periodic artificial structures which can support nontrivial acoustic topological bands, and their topological properties are linked to the existence of topological edge modes. Most previous studies focused on the topological edge modes in Bragg gaps which are induced by lattice scatterings. While local resonant gaps would be of great use in subwavelength con… ▽ More Topological phononic crystals (PCs) are periodic artificial structures which can support nontrivial acoustic topological bands, and their topological properties are linked to the existence of topological edge modes. Most previous studies focused on the topological edge modes in Bragg gaps which are induced by lattice scatterings. While local resonant gaps would be of great use in subwavelength control of acoustic waves, whether it is possible to achieve topological interface states in local resonant gaps is a question. In this article, we study the topological bands near local resonant gaps in a time-reversal symmetric acoustic systems and elaborate the evolution of band structure using a spring-mass model. Our acoustic structure can produce three band gaps in subwavelength region: one originates from local resonance of unit cell and the other two stem from band folding. It is found that the topological interface states can only exist in the band folding induced band gaps but never appear in the local resonant band gap. The numerical simulation perfectly agrees with theoretical results. Our study provides an approach of localizing the subwavelength acoustic wave. △ Less

Submitted 12 December, 2017; originally announced December 2017.

Journal ref: Phys. Rev. B 98, 014110 (2018)

arXiv:1710.02939 [pdf]

Does Normalization Methods Play a Role for Hyperspectral Image Classification?

Authors: Faxian Cao, Zhi**g Yang, **chang Ren, Mengying Jiang, Wing-Kuen Ling

Abstract: For Hyperspectral image (HSI) datasets, each class have their salient feature and classifiers classify HSI datasets according to the class's saliency features, however, there will be different salient features when use different normalization method. In this letter, we report the effect on classifiers by different normalization methods and recommend the best normalization methods for classifier af… ▽ More For Hyperspectral image (HSI) datasets, each class have their salient feature and classifiers classify HSI datasets according to the class's saliency features, however, there will be different salient features when use different normalization method. In this letter, we report the effect on classifiers by different normalization methods and recommend the best normalization methods for classifier after analyzing the impact of different normalization methods on classifiers. Pavia University datasets, Indian Pines datasets and Kennedy Space Center datasets will apply to several typical classifiers in order to evaluate and analysis the impact of different normalization methods on typical classifiers. △ Less

Submitted 9 October, 2017; originally announced October 2017.

Comments: 6 pages. 1 figure, 4 tables

arXiv:1709.03792 [pdf]

doi 10.1109/TGRS.2018.2828601

Sparse Representation Based Augmented Multinomial Logistic Extreme Learning Machine with Weighted Composite Features for Spectral Spatial Hyperspectral Image Classification

Authors: Faxian Cao, Zhi**g Yang, **chang Ren, Wing-Kuen Ling

Abstract: Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle… ▽ More Although extreme learning machine (ELM) has been successfully applied to a number of pattern recognition problems, it fails to pro-vide sufficient good results in hyperspectral image (HSI) classification due to two main drawbacks. The first is due to the random weights and bias of ELM, which may lead to ill-posed problems. The second is the lack of spatial information for classification. To tackle these two problems, in this paper, we propose a new framework for ELM based spectral-spatial classification of HSI, where probabilistic modelling with sparse representation and weighted composite features (WCF) are employed respectively to derive the op-timized output weights and extract spatial features. First, the ELM is represented as a concave logarithmic likelihood function under statistical modelling using the maximum a posteriori (MAP). Second, the sparse representation is applied to the Laplacian prior to effi-ciently determine a logarithmic posterior with a unique maximum in order to solve the ill-posed problem of ELM. The variable splitting and the augmented Lagrangian are subsequently used to further reduce the computation complexity of the proposed algorithm and it has been proven a more efficient method for speed improvement. Third, the spatial information is extracted using the weighted compo-site features (WCFs) to construct the spectral-spatial classification framework. In addition, the lower bound of the proposed method is derived by a rigorous mathematical proof. Experimental results on two publicly available HSI data sets demonstrate that the proposed methodology outperforms ELM and a number of state-of-the-art approaches. △ Less

Submitted 14 October, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

Comments: 16 pages, 6 figuers and 4 tables

arXiv:1709.02517 [pdf]

doi 10.3390/rs9121255

Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

Authors: Faxian Cao, Zhi**g Yang, **chang Ren, Wing-Kuen Ling

Abstract: Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logisti… ▽ More Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework. △ Less

Submitted 27 September, 2017; v1 submitted 7 September, 2017; originally announced September 2017.

Comments: 14 pages,7 figures,4 tables

Journal ref: Remote sensing,9,2017,1255

arXiv:1709.02505 [pdf, ps, other]

A Simple Two-stage Equalizer With Simplified Orthogonal Time Frequency Space Modulation Over Rapidly Time-varying Channels

Authors: Li Li, Hua Wei, Yao Huang, Yao Yao, Weiwei Ling, Gong Chen, Peng Li, Yunlong Cai

Abstract: In this work, we derive a equivalent delay-Doppler channel matrix of the Orthogonal Time Frequency Space (OTFS) modulation that has not been studied in previous literature. It has the similar structure as the banded channel matrix of OFDM systems over rapidly time-varying channels. However, the band in the equivalent channel matrix will no longer spread with the increase of the Doppler spread once… ▽ More In this work, we derive a equivalent delay-Doppler channel matrix of the Orthogonal Time Frequency Space (OTFS) modulation that has not been studied in previous literature. It has the similar structure as the banded channel matrix of OFDM systems over rapidly time-varying channels. However, the band in the equivalent channel matrix will no longer spread with the increase of the Doppler spread once the length of maximum channel delay spread and the OTFS frame duration are deter- mined. Furthermore, the equivalent channel matrix can simplify the OTFS modulation in the transmitter side. Incorporating the equivalent channel matrix, we propose a simple two-stage equal- izer in 1 dimensional operations for OTFS modulation. First, the receive signal is equalized using the conventional OFDM single- tap equalizer in the frequency domain. The multipath effects can be removed. In the second stage, another low complexity delay- Doppler domain equalizer is employed to eliminate the effects of the residual interference caused by the Doppler spread with the equivalent channel matrix. The simulation results demonstrate that the proposed method is superior to the conventional single- tap equalizer and full minimum mean squared error (MMSE) equalizer of OFDM systems in terms of BER in high Doppler spread scenarios. △ Less

Submitted 7 September, 2017; originally announced September 2017.

Comments: 4 pages

arXiv:1709.02253 [pdf]

doi 10.3390/s17112603

Linear vs Nonlinear Extreme Learning Machine for Spectral-Spatial Classification of Hyperspectral Image

Authors: Faxian Cao, Zhi**g Yang, **chang Ren, Mengying Jiang, Wing-Kuen Ling

Abstract: As a new machine learning approach, extreme learning machine (ELM) has received wide attentions due to its good performances. However, when directly applied to the hyperspectral image (HSI) classification, the recognition rate is too low. This is because ELM does not use the spatial information which is very important for HSI classification. In view of this, this paper proposes a new framework for… ▽ More As a new machine learning approach, extreme learning machine (ELM) has received wide attentions due to its good performances. However, when directly applied to the hyperspectral image (HSI) classification, the recognition rate is too low. This is because ELM does not use the spatial information which is very important for HSI classification. In view of this, this paper proposes a new framework for spectral-spatial classification of HSI by combining ELM with loopy belief propagation (LBP). The original ELM is linear, and the nonlinear ELMs (or Kernel ELMs) are the improvement of linear ELM (LELM). However, based on lots of experiments and analysis, we found out that the LELM is a better choice than nonlinear ELM for spectral-spatial classification of HSI. Furthermore, we exploit the marginal probability distribution that uses the whole information in the HSI and learn such distribution using the LBP. The proposed method not only maintain the fast speed of ELM, but also greatly improves the accuracy of classification. The experimental results in the well-known HSI data sets, Indian Pines and Pavia University, demonstrate the good performances of the proposed method. △ Less

Submitted 11 October, 2017; v1 submitted 5 September, 2017; originally announced September 2017.

Comments: 13 pages,8 figures,3 tables,article

Journal ref: Sensors,17,2017,2603

arXiv:1705.04146 [pdf, other]

Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

Authors: Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

Abstract: Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mat… ▽ More Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mathematical expressions that derive the final answer through a series of small steps. Although rationales do not explicitly specify programs, they provide a scaffolding for their structure via intermediate milestones. To evaluate our approach, we have created a new 100,000-sample dataset of questions, answers and rationales. Experimental results show that indirect supervision of program learning via answer rationales is a promising strategy for inducing arithmetic programs. △ Less

Submitted 23 October, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

arXiv:1703.09539 [pdf, other]

Demythization of Structural XML Query Processing: Comparison of Holistic and Binary Approaches, Technical Report

Authors: Petr Lukáš, Radim Bača, Michal Krátký, Tok Wang Ling

Abstract: XML query can be modeled by twig pattern query (TPQ) specifying predicates on XML nodes and XPath relationships satisfied between them. A lot of TPQ types have been proposed; this paper takes into account a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery semantics and, in many cases, it leads to a more efficient query processing. In gene… ▽ More XML query can be modeled by twig pattern query (TPQ) specifying predicates on XML nodes and XPath relationships satisfied between them. A lot of TPQ types have been proposed; this paper takes into account a TPQ model extended by a specification of output and non-output query nodes since it complies with the XQuery semantics and, in many cases, it leads to a more efficient query processing. In general, there are two approaches to process the TPQ: holistic joins and binary joins. Whereas the binary join approach builds a query plan as a tree of interconnected binary operators, the holistic join approach evaluates a whole query using one operator (i.e., using one complex algorithm). Surprisingly, a thorough analytical and experimental comparison is still missing despite an enormous research effort in this area. In this paper, we try to fill this gap; we analytically and experimentally show that the binary joins used in a fully-pipelined plan (i.e., the plan where each join operation does not wait for the complete result of the previous operation and no explicit sorting is used) can often outperform the holistic joins, especially for TPQs with a higher ratio of non-output query nodes. The main contributions of this paper can be summarized as follows: (i) we introduce several improvements of existing binary join approaches allowing to build a fully-pipelined plan for a TPQ considering non-output query nodes, (ii) we prove that for a certain class of TPQs such a plan has the linear time complexity with respect to the size of the input and output as well as the linear space complexity with respect to the XML document depth (i.e., the same complexity as the holistic join approaches), (iii) we show that our improved binary join approach outperforms the holistic join approaches in many situations, and (iv) we propose a simple combined approach that uses advantages of both types of approaches. △ Less

Submitted 26 July, 2019; v1 submitted 28 March, 2017; originally announced March 2017.

arXiv:1703.01898 [pdf, other]

Generative and Discriminative Text Classification with Recurrent Neural Networks

Authors: Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

Abstract: We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find tha… ▽ More We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find that generative models approach their asymptotic error rate more rapidly than their discriminative counterparts---the same pattern that Ng & Jordan (2001) proved holds for linear classification models that make more naive conditional independence assumptions. Building on this finding, we hypothesize that RNN-based generative classification models will be more robust to shifts in the data distribution. This hypothesis is confirmed in a series of experiments in zero-shot and continual learning settings that show that generative models substantially outperform discriminative models. △ Less

Submitted 25 May, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

arXiv:1701.00145 [pdf, other]

Expanding Subjective Lexicons for Social Media Mining with Embedding Subspaces

Authors: Silvio Amir, Rámon Astudillo, Wang Ling, Paula C. Carvalho, Mário J. Silva

Abstract: Recent approaches for sentiment lexicon induction have capitalized on pre-trained word embeddings that capture latent semantic properties. However, embeddings obtained by optimizing performance of a given task (e.g. predicting contextual words) are sub-optimal for other applications. In this paper, we address this problem by exploiting task-specific representations, induced via embedding sub-space… ▽ More Recent approaches for sentiment lexicon induction have capitalized on pre-trained word embeddings that capture latent semantic properties. However, embeddings obtained by optimizing performance of a given task (e.g. predicting contextual words) are sub-optimal for other applications. In this paper, we address this problem by exploiting task-specific representations, induced via embedding sub-space projection. This allows us to expand lexicons describing multiple semantic properties. For each property, our model jointly learns suitable representations and the concomitant predictor. Experiments conducted over multiple subjective lexicons, show that our model outperforms previous work and other baselines; even in low training data regimes. Furthermore, lexicon-based sentiment classifiers built on top of our lexicons outperform similar resources and yield performances comparable to those of supervised models. △ Less

Submitted 6 January, 2017; v1 submitted 31 December, 2016; originally announced January 2017.

arXiv:1611.09100 [pdf, other]

Learning to Compose Words into Sentences with Reinforcement Learning

Authors: Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

Abstract: We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experim… ▽ More We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experiments demonstrate the benefit of learning task-specific composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. We analyze the induced trees and show that while they discover some linguistically intuitive structures (e.g., noun phrases, simple verb phrases), they are different than conventional English syntactic structures. △ Less

Submitted 28 November, 2016; originally announced November 2016.

arXiv:1611.01628 [pdf, other]

Reference-Aware Language Models

Authors: Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling

Abstract: We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the inc… ▽ More We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention. △ Less

Submitted 8 August, 2017; v1 submitted 5 November, 2016; originally announced November 2016.

Comments: emnlp camera ready

arXiv:1609.09315 [pdf, other]

Semantic Parsing with Semi-Supervised Sequential Autoencoders

Authors: Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

Abstract: We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically gener… ▽ More We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically generated logical forms. △ Less

Submitted 29 September, 2016; originally announced September 2016.

arXiv:1606.05851 [pdf, other]

doi 10.1038/srep38049

Anomalous Light Scattering by Topological ${\mathcal{PT}}$-symmetric Particle Arrays

Authors: C. W. Ling, Ka Hei Choi, T. C. Mok, Z. Q. Zhang, Kin Hung Fung

Abstract: Robust topological edge modes may evolve into complex-frequency modes when a physical system becomes non-Hermitian. We show that, while having negligible forward optical extinction cross section, a conjugate pair of such complex topological edge modes in a non-Hermitian $\mathcal{PT}$-symmetric system can give rise to an anomalous sideway scattering when they are simultaneously excited by a plane… ▽ More Robust topological edge modes may evolve into complex-frequency modes when a physical system becomes non-Hermitian. We show that, while having negligible forward optical extinction cross section, a conjugate pair of such complex topological edge modes in a non-Hermitian $\mathcal{PT}$-symmetric system can give rise to an anomalous sideway scattering when they are simultaneously excited by a plane wave. We propose a realization of such scattering state in a linear array of subwavelength resonators coated with gain media. The prediction is based on an analytical two-band model and verified by rigorous numerical simulation using multiple-multipole scattering theory. The result suggests an extreme situation where leakage of classical information is unnoticeable to the transmitter and the receiver when such a $\mathcal{PT}$-symmetric unit is inserted into the communication channel. △ Less

Submitted 4 August, 2016; v1 submitted 19 June, 2016; originally announced June 2016.

Comments: 16 pages, 8 figures

Journal ref: Sci. Rep. 6 38049 (2016)

arXiv:1606.02785 [pdf, other]

Neural Network-Based Abstract Generation for Opinions and Arguments

Authors: Lu Wang, Wang Ling

Abstract: We study the problem of generating abstractive summaries for opinionated text. We propose an attention-based neural network model that is able to absorb information from multiple text units to construct informative, concise, and fluent summaries. An importance-based sampling method is designed to allow the encoder to integrate information from an important subset of input. Automatic evaluation ind… ▽ More We study the problem of generating abstractive summaries for opinionated text. We propose an attention-based neural network model that is able to absorb information from multiple text units to construct informative, concise, and fluent summaries. An importance-based sampling method is designed to allow the encoder to integrate information from an important subset of input. Automatic evaluation indicates that our system outperforms state-of-the-art abstractive and extractive summarization systems on two newly collected datasets of movie reviews and arguments. Our system summaries are also rated as more informative and grammatical in human evaluation. △ Less

Submitted 8 June, 2016; originally announced June 2016.

Comments: NAACL 2016

arXiv:1605.03852 [pdf, other]

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

Authors: Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Brian MacWhinney, Chris Dyer

Abstract: We use Bayesian optimization to learn curricula for word representation learning, optimizing performance on downstream tasks that depend on the learned representations as features. The curricula are modeled by a linear ranking function which is the scalar product of a learned weight vector and an engineered feature vector that characterizes the different aspects of the complexity of each instance… ▽ More We use Bayesian optimization to learn curricula for word representation learning, optimizing performance on downstream tasks that depend on the learned representations as features. The curricula are modeled by a linear ranking function which is the scalar product of a learned weight vector and an engineered feature vector that characterizes the different aspects of the complexity of each instance in the training corpus. We show that learning the curriculum improves performance on a variety of downstream tasks over random orders and in comparison to the natural corpus order. △ Less

Submitted 21 June, 2016; v1 submitted 12 May, 2016; originally announced May 2016.

Comments: In proceedings of ACL 2016, 10 pages

arXiv:1603.06744 [pdf, other]

Latent Predictor Networks for Code Generation

Authors: Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom

Abstract: Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be… ▽ More Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be marginalised, thus permitting scalable and effective training. Using this framework, we address the problem of generating programming code from a mixed natural language and structured specification. We create two new data sets for this paradigm derived from the collectible trading card games Magic the Gathering and Hearthstone. On these, and a third preexisting corpus, we demonstrate that marginalising multiple predictors allows our model to outperform strong benchmarks. △ Less

Submitted 8 June, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

arXiv:1602.03630 [pdf]

doi 10.1364/OL.41.001644

Simultaneous Multi-frequency Topological Edge Modes between One-dimensional Photonic Crystals

Authors: Ka Hei Choi, C. W. Ling, K. F. Lee, Y. H. Tsang, Kin Hung Fung

Abstract: We show theoretically that, in the limit of weak dispersion, one-dimensional (1D) binary centrosymmetric photonic crystals can support topological edge modes in all photonic band gaps. By analyzing their bulk band topology, these "harmonic" topological edge modes can be designed in a way that they exist at all photonic band gaps opened at the center of the Brillouin Zone, or at all gaps opened at… ▽ More We show theoretically that, in the limit of weak dispersion, one-dimensional (1D) binary centrosymmetric photonic crystals can support topological edge modes in all photonic band gaps. By analyzing their bulk band topology, these "harmonic" topological edge modes can be designed in a way that they exist at all photonic band gaps opened at the center of the Brillouin Zone, or at all gaps opened at the zone boundaries, or both. The results may suggest a new approach to achieve robust multi-frequency coupled modes for applications in nonlinear photonics, such as frequency up-conversion. △ Less

Submitted 9 March, 2016; v1 submitted 11 February, 2016; originally announced February 2016.

Journal ref: Opt. Lett. 41, 1644 (2016)

arXiv:1511.04586 [pdf, other]

Character-based Neural Machine Translation

Authors: Wang Ling, Isabel Trancoso, Chris Dyer, Alan W Black

Abstract: We introduce a neural machine translation model that views the input and output sentences as sequences of characters rather than words. Since word-level information provides a crucial source of bias, our input model composes representations of character sequences into representations of words (as determined by whitespace boundaries), and then these are translated using a joint attention/translatio… ▽ More We introduce a neural machine translation model that views the input and output sentences as sequences of characters rather than words. Since word-level information provides a crucial source of bias, our input model composes representations of character sequences into representations of words (as determined by whitespace boundaries), and then these are translated using a joint attention/translation model. In the target language, the translation is modeled as a sequence of word vectors, but each word is generated one character at a time, conditional on the previous character generations in each word. As the representation and generation of words is performed at the character level, our model is capable of interpreting and generating unseen word forms. A secondary benefit of this approach is that it alleviates much of the challenges associated with preprocessing/tokenization of the source and target languages. We show that our model can achieve translation results that are on par with conventional word-based models. △ Less

Submitted 14 November, 2015; originally announced November 2015.

arXiv:1508.02096 [pdf, other]

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Authors: Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso

Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly,… ▽ More We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish). △ Less

Submitted 23 May, 2016; v1 submitted 9 August, 2015; originally announced August 2015.

arXiv:1508.01420 [pdf, other]

Privacy-Preserving Multi-Document Summarization

Authors: Luís Marujo, José Portêlo, Wang Ling, David Martins de Matos, João P. Neto, Anatole Gershman, Jaime Carbonell, Isabel Trancoso, Bhiksha Raj

Abstract: State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. In this paper we propose a privacy-preserving approach to multi-document summarization. Our approach enables other parties to obtain summaries without learning anything else about the original documents' content. We us… ▽ More State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. In this paper we propose a privacy-preserving approach to multi-document summarization. Our approach enables other parties to obtain summaries without learning anything else about the original documents' content. We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances, instead of exact ones. Our experiments indicate that our system yields similar results to its non-private counterpart on standard multi-document evaluation datasets. △ Less

Submitted 6 August, 2015; originally announced August 2015.

Comments: 4 pages, In Proceedings of 2nd ACM SIGIR Workshop on Privacy-Preserving Information Retrieval, August 2015. arXiv admin note: text overlap with arXiv:1407.5416

ACM Class: H.3; I.2.7; K.4.1

arXiv:1508.01368 [pdf, other]

doi 10.1103/PhysRevB.92.165430

Formation of Non-reciprocal Bands in Magnetized Diatomic Plasmonic Chains

Authors: C. W. Ling, ** Wang, Kin Hung Fung

Abstract: We show that non-reciprocal bands can be formed in a magnetized periodic chain of spherical plasmonic particles with two particles per unit cell. Simplified form of symmetry operators in dipole approximations are used to demonstrate explicitly the relation between spectral non-reciprocity and broken spatial-temporal symmetries. Due to hybridization among plasmon modes and free photon modes, strong… ▽ More We show that non-reciprocal bands can be formed in a magnetized periodic chain of spherical plasmonic particles with two particles per unit cell. Simplified form of symmetry operators in dipole approximations are used to demonstrate explicitly the relation between spectral non-reciprocity and broken spatial-temporal symmetries. Due to hybridization among plasmon modes and free photon modes, strong spectral non-reciprocity appears in region slightly below the lightline, where highly directed guiding of energy can be supported. The results may provide a clear guidance on the design of one-way waveguides. △ Less

Submitted 6 August, 2015; originally announced August 2015.

Journal ref: Phys. Rev. B 92, 165430 (2015)

arXiv:1505.08075 [pdf, other]

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Authors: Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith

Abstract: We propose a technique for learning representations of parser states in transition-based dependency parsers. Our primary innovation is a new control structure for sequence-to-sequence neural networks---the stack LSTM. Like the conventional stack data structures used in transition-based parsing, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LST… ▽ More We propose a technique for learning representations of parser states in transition-based dependency parsers. Our primary innovation is a new control structure for sequence-to-sequence neural networks---the stack LSTM. Like the conventional stack data structures used in transition-based parsing, elements can be pushed to or popped from the top of the stack in constant time, but, in addition, an LSTM maintains a continuous space embedding of the stack contents. This lets us formulate an efficient parsing model that captures three facets of a parser's state: (i) unbounded look-ahead into the buffer of incoming words, (ii) the complete history of actions taken by the parser, and (iii) the complete contents of the stack of partially built tree fragments, including their internal structures. Standard backpropagation techniques are used for training and yield state-of-the-art parsing performance. △ Less

Submitted 29 May, 2015; originally announced May 2015.

Comments: Proceedings of ACL 2015

arXiv:1412.5725 [pdf, ps, other]

doi 10.1103/PhysRevB.91.235410

Non-reciprocal mu-near-zero mode in PT-symmetric magnetic domains

Authors: ** Wang, Hui Yuan Dong, Chi Wai Ling, C. T. Chan, Kin Hung Fung

Abstract: We find that a new type of non-reciprocal modes exist at an interface between two \emph{parity-time} ($\mathcal{PT}$) symmetric magnetic domains (MDs) near the frequency of zero effective permeability. The new mode is non-propagating and purely magnetic when the two MDs are semi-infinite while it becomes propagating in the finite case. In particular, two pronounced nonreciprocal responses could be… ▽ More We find that a new type of non-reciprocal modes exist at an interface between two \emph{parity-time} ($\mathcal{PT}$) symmetric magnetic domains (MDs) near the frequency of zero effective permeability. The new mode is non-propagating and purely magnetic when the two MDs are semi-infinite while it becomes propagating in the finite case. In particular, two pronounced nonreciprocal responses could be observed via the excitation of this mode: one-way optical tunneling for oblique incidence and unidirectional beam shift at normal incidence. When the two MDs system becomes finite in size, it is found that perfect-transmission mode could be achieved if $\mathcal{PT}$-symmetry is maintained. The unique properties of such an unusual mode are investigated by analytical modal calculation as well as numerical simulations. The results suggest a new approach to the design of compact optical isolator. △ Less

Submitted 29 April, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

Comments: 8 pages, 6 figures

Journal ref: Physical Review B 91, 235410 (2015)

arXiv:1401.7520 [pdf, ps, other]

doi 10.1364/OE.23.002021

Topological Edge Plasmon Modes between Diatomic Chains of Nanoparticles

Authors: C. W. Ling, Meng Xiao, C. T. Chan, S. F. Yu, Kin Hung Fung

Abstract: We study the topological edge plasmon modes between two "diatomic" chains of identical plasmonic nanoparticles. Zak phase for longitudinal plasmon modes in each chain is calculated analytically by solutions of macroscopic Maxwell's equations for particles in quasi-static dipole approximation. This approximation provides a direct analogy with the Su-Schrieffer-Heeger model such that the eigenvalue… ▽ More We study the topological edge plasmon modes between two "diatomic" chains of identical plasmonic nanoparticles. Zak phase for longitudinal plasmon modes in each chain is calculated analytically by solutions of macroscopic Maxwell's equations for particles in quasi-static dipole approximation. This approximation provides a direct analogy with the Su-Schrieffer-Heeger model such that the eigenvalue is mapped to the frequency dependent inverse-polarizability of the nanoparticles. The edge state frequency is found to be the same as the single-particle resonance frequency, which is insensitive to the separation distances within a unit cell. Finally, full electrodynamic simulations with realistic parameters suggest that the edge plasmon mode can be realized through near-field optical spectroscopy. △ Less

Submitted 7 May, 2015; v1 submitted 29 January, 2014; originally announced January 2014.

Comments: 7 pages, 6 figures

Journal ref: Opt Express 23, 2021-2031 (2015)

arXiv:1306.4908 [pdf]

Recognition of Named-Event Passages in News Articles

Authors: Luis Marujo, Wang Ling, Anatole Gershman, Jaime Carbonell, João P. Neto, David Matos

Abstract: We extend the concept of Named Entities to Named Events - commonly occurring events such as battles and earthquakes. We propose a method for finding specific passages in news articles that contain information about such events and report our preliminary evaluation results. Collecting "Gold Standard" data presents many problems, both practical and conceptual. We present a method for obtaining such… ▽ More We extend the concept of Named Entities to Named Events - commonly occurring events such as battles and earthquakes. We propose a method for finding specific passages in news articles that contain information about such events and report our preliminary evaluation results. Collecting "Gold Standard" data presents many problems, both practical and conceptual. We present a method for obtaining such data using the Amazon Mechanical Turk service. △ Less

Submitted 20 June, 2013; originally announced June 2013.

Comments: In 25th International Conference on Computational Linguistics (COLING 2012)

arXiv:1208.2448

Breaking Out The XML MisMatch Trap

Authors: Yong Zeng, Zhifeng Bao, Guoliang Li, Tok Wang Ling, Jiaheng Lu

Abstract: In keyword search, when user cannot get what she wants, query refinement is needed and reason can be various. We first give a thorough categorization of the reason, then focus on solving one category of query refinement problem in the context of XML keyword search, where what user searches for does not exist in the data. We refer to it as the MisMatch problem in this paper. Then we propose a pract… ▽ More In keyword search, when user cannot get what she wants, query refinement is needed and reason can be various. We first give a thorough categorization of the reason, then focus on solving one category of query refinement problem in the context of XML keyword search, where what user searches for does not exist in the data. We refer to it as the MisMatch problem in this paper. Then we propose a practical way to detect the MisMatch problem and generate helpful suggestions to users. Our approach can be viewed as a post-processing job of query evaluation, and has three main features: (1) it adopts both the suggested queries and their sample results as the output to user, hel** user judge whether the MisMatch problem is solved without consuming all query results; (2) it is portable in the sense that it can work with any LCA-based matching semantics and orthogonal to the choice of result retrieval method adopted; (3) it is lightweight in the way that it occupies a very small proportion of the whole query evaluation time. Extensive experiments on three real datasets verify the effectiveness, efficiency and scalability of our approach. An online XML keyword search engine called XClear that embeds the MisMatch problem detector and suggester has been built. △ Less

Submitted 7 November, 2012; v1 submitted 12 August, 2012; originally announced August 2012.

Comments: The article is already withdrawn

Showing 1–49 of 49 results for author: Ling, W