-
Multi-probe analysis of the galaxy cluster CL J1226.9+3332: hydrostatic mass and hydrostatic-to-lensing bias
Authors:
M. Muñoz-Echeverría,
R. Adam,
P. Ade,
H. Ajeddig,
P. André,
M. Arnaud,
E. Artis,
H. Aussel,
I. Bartalucci,
A. Beelen,
A. Benoît,
S. Berta,
L. Bing,
O. Bourrion,
M. Calvo,
A. Catalano,
M. De Petris,
F. -X. Désert,
S. Doyle,
E. F. C. Driessen,
A. Ferragamo,
A. Gomez,
J. Goupy,
F. Kéruzoré,
C. Kramer
, et al. (26 additional authors not shown)
Abstract:
We present a multi-probe analysis of the well-known galaxy cluster CL J1226.9+3332 as a proof of concept for multi-wavelength studies within the framework of the NIKA2 Sunyaev-Zeldovich Large Program (LPSZ). CL J1226.9+3332 is a massive and high redshift (z = 0.888) cluster that has already been observed at several wavelengths. A joint analysis of the thermal SZ (tSZ) effect at millimeter waveleng…
▽ More
We present a multi-probe analysis of the well-known galaxy cluster CL J1226.9+3332 as a proof of concept for multi-wavelength studies within the framework of the NIKA2 Sunyaev-Zeldovich Large Program (LPSZ). CL J1226.9+3332 is a massive and high redshift (z = 0.888) cluster that has already been observed at several wavelengths. A joint analysis of the thermal SZ (tSZ) effect at millimeter wavelength with the NIKA2 camera and in X-ray with the XMM-Newton satellite permits the reconstruction of the cluster thermodynamical properties and mass assuming hydrostatic equilibrium. We test the robustness of our mass estimates against different definitions of the data analysis transfer function. Using convergence maps reconstructed from the data of the CLASH program we obtain estimates of the lensing mass, which we compare to the estimated hydrostatic mass. This allows us to measure the hydrostatic-to-lensing mass bias and the associated systematic effects related to the NIKA2 measurement. We obtain M500HSE = (7.65 +- 1.03) 1014 Msun and M500lens = (7.35 +- 0.65) 1014 Msun, which implies a HSE-to-lensing bias consistent with 0 within 20 percent.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Galactic Star Formation with NIKA2 (GASTON): Evidence of mass accretion onto dense clumps
Authors:
A. J. Rigby,
R. Adam,
P. Ade,
H. Ajeddig,
M. Anderson,
P. André,
E. Artis,
H. Aussel,
A. Bacmann,
A. Beelen,
A. Benoît,
S. Berta,
L. Bing,
O. Bourrion,
A. Bracco,
M. Calvo,
A. Catalano,
M. De Petris,
F. -X. Désert,
S. Doyle,
E. F. C. Driessen,
P. García,
A. Gomez,
J. Goupy,
F. Kéruzoré
, et al. (27 additional authors not shown)
Abstract:
High-mass stars ($m_* \gtrsim 8 \, M_\odot$) play a crucial role in the evolution of galaxies, and so it is imperative that we understand how they are formed. We have used the New IRAM KIDs Array 2 (NIKA2) camera on the Institut de Radio Astronomie Millimétrique (IRAM) 30-m telescope to conduct high-sensitivity continuum map** of $\sim2$ deg$^2$ of the Galactic plane (GP) as part of the Galactic…
▽ More
High-mass stars ($m_* \gtrsim 8 \, M_\odot$) play a crucial role in the evolution of galaxies, and so it is imperative that we understand how they are formed. We have used the New IRAM KIDs Array 2 (NIKA2) camera on the Institut de Radio Astronomie Millimétrique (IRAM) 30-m telescope to conduct high-sensitivity continuum map** of $\sim2$ deg$^2$ of the Galactic plane (GP) as part of the Galactic Star Formation with NIKA2 (GASTON) large program. We have identified a total of 1467 clumps within our deep 1.15 mm continuum maps and, by using overlap** continuum, molecular line, and maser parallax data, we have determined their distances and physical properties. By placing them upon an approximate evolutionary sequence based upon 8 $μ$m $\textit{Spitzer}$ imaging, we find evidence that the most massive dense clumps accrete material from their surrounding environment during their early evolution, before dispersing as star formation advances, supporting clump-fed models of high-mass star formation.
△ Less
Submitted 10 December, 2021; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Map** the intracluster medium temperature in the era of NIKA2 and MUSTANG-2
Authors:
F. Ruppin,
R. Adam,
P. Ade,
H. Ajeddig,
P. André,
E. Artis,
H. Aussel,
A. Beelen,
A. Benoît,
S. Berta,
L. Bing,
O. Bourrion,
M. Brodwin,
M. Calvo,
A. Catalano,
B. Decker,
M. De Petris,
F. -X. Désert,
S. Doyle,
E. F. C. Driessen,
P. R. M. Eisenhardt,
A. Gomez,
A. H. Gonzalez,
J. Goupy,
F. Kéruzoré
, et al. (28 additional authors not shown)
Abstract:
We present preliminary results from an on-going program that aims at map** the intracluster medium (ICM) temperature of high redshift galaxy clusters from the MaDCoWS sample using a joint analysis of shallow X-ray data obtained by $Chandra$ and high angular resolution Sunyaev-Zel'dovich (SZ) observations realized with the NIKA2 and MUSTANG-2 cameras. We also present preliminary results from an o…
▽ More
We present preliminary results from an on-going program that aims at map** the intracluster medium (ICM) temperature of high redshift galaxy clusters from the MaDCoWS sample using a joint analysis of shallow X-ray data obtained by $Chandra$ and high angular resolution Sunyaev-Zel'dovich (SZ) observations realized with the NIKA2 and MUSTANG-2 cameras. We also present preliminary results from an on-going Open Time program within the NIKA2 collaboration that aims at map** the ICM temperature of a galaxy cluster at $z=0.45$ from the resolved detection of the relativistic corrections to the SZ spectrum. These studies demonstrate how high angular resolution SZ observations will play a major role in the coming decade to push the investigation of ICM dynamics and non-gravitational processes to high redshift before the next generation X-ray observatories come into play.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Searching for high-z DSFGs with NIKA2 and NOEMA
Authors:
L. Bing,
R. Adam,
P. Ade,
H. Ajeddig,
P. André,
E. Artis,
H. Aussel,
A. Beelen,
A. Benoît,
S. Berta,
M. Béthermin,
O. Bourrion,
M. Calvo,
A. Catalano,
M. De Petris,
F. -X. Désert,
S. Doyle,
E. F. C. Driessen,
A. Gomez,
J. Goupy,
F. Kéruzoré,
C. Kramer,
B. Ladjelate,
G. Lagache,
S. Leclercq
, et al. (23 additional authors not shown)
Abstract:
As the possible progenitors of passive galaxies at z=2-3, dusty star-forming galaxies (DSFGs) at z>4 provide a unique perspective to study the formation, assembly, and early quenching of massive galaxies in the early Universe. The extreme obscuration in optical-IR makes (sub)mm spectral scans the most universal and unbiased way to confirm/exclude the high-z nature of candidate dusty star-forming g…
▽ More
As the possible progenitors of passive galaxies at z=2-3, dusty star-forming galaxies (DSFGs) at z>4 provide a unique perspective to study the formation, assembly, and early quenching of massive galaxies in the early Universe. The extreme obscuration in optical-IR makes (sub)mm spectral scans the most universal and unbiased way to confirm/exclude the high-z nature of candidate dusty star-forming galaxies. We present here the status of the NIKA2 Cosmological Legacy Survey (N2CLS), which is the deepest wide-area single-dish survey in the millimeter searching for high-z DSFGs. We also introduce a joint-analysis method to efficiently search for the spectroscopic redshift of high-z DSFGs with noisy spectra and photometric data and present its success in identifying the redshift of DSFGs found in NIKA2 science verification data.
△ Less
Submitted 29 October, 2021;
originally announced November 2021.
-
GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems
Authors:
Bosheng Ding,
Junjie Hu,
Lidong Bing,
Sharifah Mahani Aljunied,
Shafiq Joty,
Luo Si,
Chunyan Miao
Abstract:
Much recent progress in task-oriented dialogue (ToD) systems has been driven by available annotation data across multiple domains for training. Over the last few years, there has been a move towards data curation for multilingual ToD systems that are applicable to serve people speaking different languages. However, existing multilingual ToD datasets either have a limited coverage of languages due…
▽ More
Much recent progress in task-oriented dialogue (ToD) systems has been driven by available annotation data across multiple domains for training. Over the last few years, there has been a move towards data curation for multilingual ToD systems that are applicable to serve people speaking different languages. However, existing multilingual ToD datasets either have a limited coverage of languages due to the high cost of data curation, or ignore the fact that dialogue entities barely exist in countries speaking these languages. To tackle these limitations, we introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset globalized from an English ToD dataset for three unexplored use cases. Our method is based on translating dialogue templates and filling them with local entities in the target-language countries. We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.
△ Less
Submitted 1 April, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation
Authors:
Chenhui Shen,
Liying Cheng,
Ran Zhou,
Lidong Bing,
Yang You,
Luo Si
Abstract:
When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited. A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should…
▽ More
When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited. A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should leverage both the input text and the control signal to guide the generation, which can only be built with a deep understanding of the domain knowledge. Motivated by this vision, our paper introduces a new text generation dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its 45k meta-review sentences are manually annotated with one of the 9 carefully defined categories, including abstract, strength, decision, etc. We present experimental results on start-of-the-art summarization models, and propose methods for structure-controlled generation with both extractive and abstractive models using our annotated data. By exploring various settings and analyzing the model behavior with respect to the control signal, we demonstrate the challenges of our proposed task and the values of our dataset MReD. Meanwhile, MReD also allows us to have a better understanding of the meta-review domain.
△ Less
Submitted 5 July, 2022; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Aspect Sentiment Quad Prediction as Paraphrase Generation
Authors:
Wenxuan Zhang,
Yang Deng,
Xin Li,
Yifei Yuan,
Lidong Bing,
Wai Lam
Abstract:
Aspect-based sentiment analysis (ABSA) has been extensively studied in recent years, which typically involves four fundamental sentiment elements, including the aspect category, aspect term, opinion term, and sentiment polarity. Existing studies usually consider the detection of partial sentiment elements, instead of predicting the four elements in one shot. In this work, we introduce the Aspect S…
▽ More
Aspect-based sentiment analysis (ABSA) has been extensively studied in recent years, which typically involves four fundamental sentiment elements, including the aspect category, aspect term, opinion term, and sentiment polarity. Existing studies usually consider the detection of partial sentiment elements, instead of predicting the four elements in one shot. In this work, we introduce the Aspect Sentiment Quad Prediction (ASQP) task, aiming to jointly detect all sentiment elements in quads for a given opinionated sentence, which can reveal a more comprehensive and complete aspect-level sentiment structure. We further propose a novel \textsc{Paraphrase} modeling paradigm to cast the ASQP task to a paraphrase generation process. On one hand, the generation formulation allows solving ASQP in an end-to-end manner, alleviating the potential error propagation in the pipeline solution. On the other hand, the semantics of the sentiment elements can be fully exploited by learning to generate them in the natural language form. Extensive experiments on benchmark datasets show the superiority of our proposed method and the capacity of cross-task transfer with the proposed unified \textsc{Paraphrase} modeling framework.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Multilingual AMR Parsing with Noisy Knowledge Distillation
Authors:
Deng Cai,
Xin Li,
Jackie Chun-Sing Ho,
Lidong Bing,
Wai Lam
Abstract:
We study multilingual AMR parsing from the perspective of knowledge distillation, where the aim is to learn and improve a multilingual AMR parser by using an existing English parser as its teacher. We constrain our exploration in a strict multilingual setting: there is but one model to parse all different languages including English. We identify that noisy input and precise output are the key to s…
▽ More
We study multilingual AMR parsing from the perspective of knowledge distillation, where the aim is to learn and improve a multilingual AMR parser by using an existing English parser as its teacher. We constrain our exploration in a strict multilingual setting: there is but one model to parse all different languages including English. We identify that noisy input and precise output are the key to successful distillation. Together with extensive pre-training, we obtain an AMR parser whose performances surpass all previously published results on four different foreign languages, including German, Spanish, Italian, and Chinese, by large margins (up to 18.8 \textsc{Smatch} points on Chinese and on average 11.3 \textsc{Smatch} points). Our parser also achieves comparable performance on English to the latest state-of-the-art English-only parser.
△ Less
Submitted 13 October, 2021; v1 submitted 30 September, 2021;
originally announced September 2021.
-
MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER
Authors:
Ran Zhou,
Xin Li,
Ruidan He,
Lidong Bing,
Erik Cambria,
Luo Si,
Chunyan Miao
Abstract:
Data augmentation is an effective solution to data scarcity in low-resource scenarios. However, when applied to token-level tasks such as NER, data augmentation methods often suffer from token-label misalignment, which leads to unsatsifactory performance. In this work, we propose Masked Entity Language Modeling (MELM) as a novel data augmentation framework for low-resource NER. To alleviate the to…
▽ More
Data augmentation is an effective solution to data scarcity in low-resource scenarios. However, when applied to token-level tasks such as NER, data augmentation methods often suffer from token-label misalignment, which leads to unsatsifactory performance. In this work, we propose Masked Entity Language Modeling (MELM) as a novel data augmentation framework for low-resource NER. To alleviate the token-label misalignment issue, we explicitly inject NER labels into sentence context, and thus the fine-tuned MELM is able to predict masked entity tokens by explicitly conditioning on their labels. Thereby, MELM generates high-quality augmented data with novel entities, which provides rich entity regularity knowledge and boosts NER performance. When training data from multiple languages are available, we also integrate MELM with code-mixing for further improvement. We demonstrate the effectiveness of MELM on monolingual, cross-lingual and multilingual NER across various low-resource levels. Experimental results show that our MELM presents substantial improvement over the baseline methods.
△ Less
Submitted 18 March, 2022; v1 submitted 31 August, 2021;
originally announced August 2021.
-
Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction
Authors:
Lu Xu,
Yew Ken Chia,
Lidong Bing
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which con…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is the most recent subtask of ABSA which outputs triplets of an aspect target, its associated sentiment, and the corresponding opinion term. Recent models perform the triplet extraction in an end-to-end manner but heavily rely on the interactions between each target word and opinion word. Thereby, they cannot perform well on targets and opinions which contain multiple words. Our proposed span-level approach explicitly considers the interaction between the whole spans of targets and opinions when predicting their sentiment relation. Thus, it can make predictions with the semantics of whole spans, ensuring better sentiment consistency. To ease the high computational cost caused by span enumeration, we propose a dual-channel span pruning strategy by incorporating supervision from the Aspect Term Extraction (ATE) and Opinion Term Extraction (OTE) tasks. This strategy not only improves computational efficiency but also distinguishes the opinion and target spans more properly. Our framework simultaneously achieves strong performance for the ASTE as well as ATE and OTE tasks. In particular, our analysis shows that our span-level approach achieves more significant improvements over the baselines on triplets with multi-word targets or opinions.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation
Authors:
Ruidan He,
Linlin Liu,
Hai Ye,
Qingyu Tan,
Bosheng Ding,
Liying Cheng,
Jia-Wei Low,
Lidong Bing,
Luo Si
Abstract:
Adapter-based tuning has recently arisen as an alternative to fine-tuning. It works by adding light-weight adapter modules to a pretrained language model (PrLM) and only updating the parameters of adapter modules when learning on a downstream task. As such, it adds only a few trainable parameters per new task, allowing a high degree of parameter sharing. Prior studies have shown that adapter-based…
▽ More
Adapter-based tuning has recently arisen as an alternative to fine-tuning. It works by adding light-weight adapter modules to a pretrained language model (PrLM) and only updating the parameters of adapter modules when learning on a downstream task. As such, it adds only a few trainable parameters per new task, allowing a high degree of parameter sharing. Prior studies have shown that adapter-based tuning often achieves comparable results to fine-tuning. However, existing work only focuses on the parameter-efficient aspect of adapter-based tuning while lacking further investigation on its effectiveness. In this paper, we study the latter. We first show that adapter-based tuning better mitigates forgetting issues than fine-tuning since it yields representations with less deviation from those generated by the initial PrLM. We then empirically compare the two tuning methods on several downstream NLP tasks and settings. We demonstrate that 1) adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks; 2) it is more robust to overfitting and less sensitive to changes in learning rates.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.
-
Probing possible effects of circumgalactic media on the metal content of galaxies through the mass-metallicity relationship
Authors:
Sai Zhai,
Yong Shi,
Jianhang Chen,
Longji Bing,
Yanmei Chen,
Xiaoling Yu,
Songlin Li
Abstract:
The circumgalactic medium (CGM) connects the gas between the interstellar medium (ISM) and the intergalactic medium, which plays an important role in galaxy evolution. We use the stellar mass-metallicity relationship to investigate whether sharing the CGM will affect the distribution of metals in galaxy pairs. The optical emission lines from the Sloan Digital Sky Survey Data Release (SDSS DR7) are…
▽ More
The circumgalactic medium (CGM) connects the gas between the interstellar medium (ISM) and the intergalactic medium, which plays an important role in galaxy evolution. We use the stellar mass-metallicity relationship to investigate whether sharing the CGM will affect the distribution of metals in galaxy pairs. The optical emission lines from the Sloan Digital Sky Survey Data Release (SDSS DR7) are used to measure the gas-phase metallicity. We find that there is no significant difference in the distribution of the metallicity difference between two members in star forming-star forming pairs ($\rm Δlog(O/H)_{diff}$), metallicity offset from the best-fitted stellar mass-metallicity relationship of galaxies in pairs ($\rm Δlog(O/H)_{MS}$), as compared to "fake" pairs. By looking at $\rm Δlog(O/H)_{diff}$ and $\rm Δlog(O/H)_{MS}$ as a function of the star formation rate (SFR), specific star formation rate (sSFR), and stellar mass ratio, no difference is seen between galaxies in pairs and control galaxies. From our results, the share of the CGM may not play an important role in sha** the evolution of metal contents of galaxies.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Better Feature Integration for Named Entity Recognition
Authors:
Lu Xu,
Zhanming Jie,
Wei Lu,
Lidong Bing
Abstract:
It has been shown that named entity recognition (NER) could benefit from incorporating the long-distance structured information captured by dependency trees. We believe this is because both types of features - the contextual information captured by the linear sequences and the structured information captured by the dependency trees may complement each other. However, existing approaches largely fo…
▽ More
It has been shown that named entity recognition (NER) could benefit from incorporating the long-distance structured information captured by dependency trees. We believe this is because both types of features - the contextual information captured by the linear sequences and the structured information captured by the dependency trees may complement each other. However, existing approaches largely focused on stacking the LSTM and graph neural networks such as graph convolutional networks (GCNs) for building improved NER models, where the exact interaction mechanism between the two types of features is not very clear, and the performance gain does not appear to be significant. In this work, we propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM), which clearly captures how the two types of features interact. We conduct extensive experiments on several standard datasets across four languages. The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters. Our further analysis demonstrates that our model can capture longer dependencies compared with strong baselines.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings
Authors:
Linlin Liu,
Thien Hai Nguyen,
Shafiq Joty,
Lidong Bing,
Luo Si
Abstract:
Cross-lingual word embeddings (CLWE) have been proven useful in many cross-lingual tasks. However, most existing approaches to learn CLWE including the ones with contextual embeddings are sense agnostic. In this work, we propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only. We operationalize our framework by…
▽ More
Cross-lingual word embeddings (CLWE) have been proven useful in many cross-lingual tasks. However, most existing approaches to learn CLWE including the ones with contextual embeddings are sense agnostic. In this work, we propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only. We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly. The monolingual ELMo and BERT models pretrained with our sense-aware cross entropy loss demonstrate significant performance improvement for word sense disambiguation tasks. We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs (English to German/Spanish/Japanese/Chinese). Compared with the best baseline results, our cross-lingual models achieve 0.52%, 2.09% and 1.29% average performance improvements on zero-shot cross-lingual NER, sentiment classification and XNLI tasks, respectively.
△ Less
Submitted 15 September, 2022; v1 submitted 10 March, 2021;
originally announced March 2021.
-
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model
Authors:
Juntao Li,
Ruidan He,
Hai Ye,
Hwee Tou Ng,
Lidong Bing,
Rui Yan
Abstract:
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and low-resource tasks. Through training on one hundred languages and terabytes of texts, cross-lingual language models have proven to be effective in leveraging high-resource languages to enhance low-resource language proces…
▽ More
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and low-resource tasks. Through training on one hundred languages and terabytes of texts, cross-lingual language models have proven to be effective in leveraging high-resource languages to enhance low-resource language processing and outperform monolingual models. In this paper, we further investigate the cross-lingual and cross-domain (CLCD) setting when a pretrained cross-lingual language model needs to adapt to new domains. Specifically, we propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features and domain-invariant features from the entangled pretrained cross-lingual representations, given unlabeled raw texts in the source language. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts. Experimental results show that our proposed method achieves significant performance improvements over the state-of-the-art pretrained cross-lingual language model in the CLCD setting. The source code of this paper is publicly available at https://github.com/lijuntaopku/UFD.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks
Authors:
Bosheng Ding,
Linlin Liu,
Lidong Bing,
Canasai Kruengkrai,
Thien Hai Nguyen,
Shafiq Joty,
Luo Si,
Chunyan Miao
Abstract:
Data augmentation techniques have been widely used to improve machine learning performance as they enhance the generalization capability of models. In this work, to generate high quality synthetic data for low-resource tagging tasks, we propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervis…
▽ More
Data augmentation techniques have been widely used to improve machine learning performance as they enhance the generalization capability of models. In this work, to generate high quality synthetic data for low-resource tagging tasks, we propose a novel augmentation method with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings. For the supervised settings, we conduct extensive experiments on named entity recognition (NER), part of speech (POS) tagging and end-to-end target based sentiment analysis (E2E-TBSA) tasks. For the semi-supervised settings, we evaluate our method on the NER task under the conditions of given unlabeled data only and unlabeled data plus a knowledge base. The results show that our method can consistently outperform the baselines, particularly when the given gold training data are less.
△ Less
Submitted 3 November, 2020;
originally announced November 2020.
-
Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond
Authors:
Xin Li,
Lidong Bing,
Wenxuan Zhang,
Zheng Li,
Wai Lam
Abstract:
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach, which have been studied extensively on the sequence-level tasks. We further verify the efficacy of these cross-lingual adaptation approaches by evaluating their performances on more fine-grained sequence tagging tasks. After re-ex…
▽ More
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach, which have been studied extensively on the sequence-level tasks. We further verify the efficacy of these cross-lingual adaptation approaches by evaluating their performances on more fine-grained sequence tagging tasks. After re-examining their strengths and drawbacks, we propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance. Instead of simply augmenting the source data with the machine-translated data, we tailor-make a warm-up mechanism to quickly update the mPTLMs with the gradients estimated on a few translated data. Then, the adaptation approach is applied to the refined parameters and the cross-lingual transfer is performed in a warm-start way. The experimental results on nine target languages demonstrate that our method is beneficial to the cross-lingual adaptation of various sequence tagging tasks.
△ Less
Submitted 22 June, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Lightweight, Dynamic Graph Convolutional Networks for AMR-to-Text Generation
Authors:
Yan Zhang,
Zhijiang Guo,
Zhiyang Teng,
Wei Lu,
Shay B. Cohen,
Zuozhu Liu,
Lidong Bing
Abstract:
AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and additionally, they follow a local (first-order) information…
▽ More
AMR-to-text generation is used to transduce Abstract Meaning Representation structures (AMR) into text. A key challenge in this task is to efficiently learn effective graph representations. Previously, Graph Convolution Networks (GCNs) were used to encode input AMRs, however, vanilla GCNs are not able to capture non-local information and additionally, they follow a local (first-order) information aggregation scheme. To account for these issues, larger and deeper GCN models are required to capture more complex interactions. In this paper, we introduce a dynamic fusion mechanism, proposing Lightweight Dynamic Graph Convolutional Networks (LDGCNs) that capture richer non-local interactions by synthesizing higher order information from the input graphs. We further develop two novel parameter saving strategies based on the group graph convolutions and weight tied convolutions to reduce memory usage and model complexity. With the help of these strategies, we are able to train a model with fewer parameters while maintaining the model capacity. Experiments demonstrate that LDGCNs outperform state-of-the-art models on two benchmark datasets for AMR-to-text generation with significantly fewer parameters.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Aspect Based Sentiment Analysis with Aspect-Specific Opinion Spans
Authors:
Lu Xu,
Lidong Bing,
Wei Lu,
Fei Huang
Abstract:
Aspect based sentiment analysis, predicting sentiment polarity of given aspects, has drawn extensive attention. Previous attention-based models emphasize using aspect semantics to help extract opinion features for classification. However, these works are either not able to capture opinion spans as a whole, or not able to capture variable-length opinion spans. In this paper, we present a neat and e…
▽ More
Aspect based sentiment analysis, predicting sentiment polarity of given aspects, has drawn extensive attention. Previous attention-based models emphasize using aspect semantics to help extract opinion features for classification. However, these works are either not able to capture opinion spans as a whole, or not able to capture variable-length opinion spans. In this paper, we present a neat and effective structured attention model by aggregating multiple linear-chain CRFs. Such a design allows the model to extract aspect-specific opinion spans and then evaluate sentiment polarity by exploiting the extracted opinion features. The experimental results on four datasets demonstrate the effectiveness of the proposed model, and our analysis demonstrates that our model can capture aspect-specific opinion spans.
△ Less
Submitted 12 April, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Position-Aware Tagging for Aspect Sentiment Triplet Extraction
Authors:
Lu Xu,
Hao Li,
Wei Lu,
Lidong Bing
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting the triplets of target entities, their associated sentiment, and opinion spans explaining the reason for the sentiment. Existing research efforts mostly solve this problem using pipeline approaches, which break the triplet extraction process into several stages. Our observation is that the three elements within a triplet are high…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting the triplets of target entities, their associated sentiment, and opinion spans explaining the reason for the sentiment. Existing research efforts mostly solve this problem using pipeline approaches, which break the triplet extraction process into several stages. Our observation is that the three elements within a triplet are highly related to each other, and this motivates us to build a joint model to extract such triplets using a sequence tagging approach. However, how to effectively design a tagging approach to extract the triplets that can capture the rich interactions among the elements is a challenging research question. In this work, we propose the first end-to-end model with a novel position-aware tagging scheme that is capable of jointly extracting the triplets. Our experimental results on several existing datasets show that jointly capturing elements in the triplet using our approach leads to improved performance over the existing approaches. We also conducted extensive experiments to investigate the model effectiveness and robustness.
△ Less
Submitted 9 March, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Partially-Aligned Data-to-Text Generation with Distant Supervision
Authors:
Zihao Fu,
Bei Shi,
Wai Lam,
Lidong Bing,
Zhiyuan Liu
Abstract:
The Data-to-Text task aims to generate human-readable text for describing some given structured data enabling more interpretability. However, the typical generation task is confined to a few particular domains since it requires well-aligned data which is difficult and expensive to obtain. Using partially-aligned data is an alternative way of solving the dataset scarcity problem. This kind of data…
▽ More
The Data-to-Text task aims to generate human-readable text for describing some given structured data enabling more interpretability. However, the typical generation task is confined to a few particular domains since it requires well-aligned data which is difficult and expensive to obtain. Using partially-aligned data is an alternative way of solving the dataset scarcity problem. This kind of data is much easier to obtain since it can be produced automatically. However, using this kind of data induces the over-generation problem posing difficulties for existing models, which tends to add unrelated excerpts during the generation procedure. In order to effectively utilize automatically annotated partially-aligned datasets, we extend the traditional generation task to a refined task called Partially-Aligned Data-to-Text Generation (PADTG) which is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. To tackle this new task, we propose a novel distant supervision generation framework. It firstly estimates the input data's supportiveness for each target word with an estimator and then applies a supportiveness adaptor and a rebalanced beam search to harness the over-generation problem in the training and generation phases respectively. We also contribute a partially-aligned dataset (The data and source code of this paper can be obtained from https://github.com/fuzihaofzh/distant_supervision_nlg by sampling sentences from Wikipedia and automatically extracting corresponding KB triples for each sentence from Wikidata. The experimental results show that our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
An Unsupervised Sentence Embedding Method by Mutual Information Maximization
Authors:
Yan Zhang,
Ruidan He,
Zuozhu Liu,
Kwan Hui Lim,
Lidong Bing
Abstract:
BERT is inefficient for sentence-pair tasks such as clustering or semantic search as it needs to evaluate combinatorially many sentence pairs which is very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by learning semantically meaningful representations of single sentences, such that similarity comparison can be easily accessed. However, SBERT is trained on corpus with hi…
▽ More
BERT is inefficient for sentence-pair tasks such as clustering or semantic search as it needs to evaluate combinatorially many sentence pairs which is very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by learning semantically meaningful representations of single sentences, such that similarity comparison can be easily accessed. However, SBERT is trained on corpus with high-quality labeled sentence pairs, which limits its application to tasks where labeled data is extremely scarce. In this paper, we propose a lightweight extension on top of BERT and a novel self-supervised learning objective based on mutual information maximization strategies to derive meaningful sentence embeddings in an unsupervised manner. Unlike SBERT, our method is not restricted by the availability of labeled data, such that it can be applied on different domain-specific corpus. Experimental results show that the proposed method significantly outperforms other unsupervised sentence embedding baselines on common semantic textual similarity (STS) tasks and downstream supervised tasks. It also outperforms SBERT in a setting where in-domain labeled data is not available, and achieves performance competitive with supervised methods on various tasks.
△ Less
Submitted 4 February, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Feature Adaptation of Pre-Trained Language Models across Languages and Domains with Robust Self-Training
Authors:
Hai Ye,
Qingyu Tan,
Ruidan He,
Juntao Li,
Hwee Tou Ng,
Lidong Bing
Abstract:
Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from t…
▽ More
Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation module and the features from the same class are more tightly clustered. We further extend CFd to a cross-language setting, in which language discrepancy is studied. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training in cross-domain and cross-language settings.
△ Less
Submitted 30 November, 2020; v1 submitted 24 September, 2020;
originally announced September 2020.
-
Host Galaxy Properties of Changing-look AGN Revealed in the MaNGA Survey
Authors:
Xiaoling Yu,
Yong Shi,
Yanmei Chen,
Jianhang Chen,
Songlin Li,
Longji Bing,
Junqiang Ge,
Rogemar A. Riffel,
Rogério Riffel
Abstract:
Changing-look Active Galactic Nuclei (CL-AGNs) are a subset of AGNs in which the broad Balmer emission lines appear or disappear within a few years. We use the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey to identify five CL-AGNs. The 2-D photometric and kinematic maps reveal common features as well as some unusual properties of CL-AGN hosts as compared to the AGN hosts in ge…
▽ More
Changing-look Active Galactic Nuclei (CL-AGNs) are a subset of AGNs in which the broad Balmer emission lines appear or disappear within a few years. We use the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey to identify five CL-AGNs. The 2-D photometric and kinematic maps reveal common features as well as some unusual properties of CL-AGN hosts as compared to the AGN hosts in general. All MaNGA CL-AGNs reside in the star-forming main sequence, similar to MaNGA non-changing-look AGNs (NCL-AGNs). The $80\% \pm 16\%$ of our CL-AGNs do possess pseudo-bulge features, and follow the overall NCL-AGNs $M_{BH}-σ_{*}$ relationship. The kinematic measurements indicate that they have similar distributions in the plane of angular momentum versus galaxy ellipticity. MaNGA CL-AGNs however show a higher, but not statistically significant ($20\% \pm 16\%$) fraction of counter-rotating features compared to that ($1.84\% \pm 0.61\%$) in general star-formation population. In addition, MaNGA CL-AGNs favor more face-on (axis ratio $>$ 0.7) than that of Type I NCL-AGNs. These results suggest that host galaxies could play a role in the CL-AGN phenomenon.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce
Authors:
Juntao Li,
Chang Liu,
Jian Wang,
Lidong Bing,
Hongsong Li,
Xiaozhong Liu,
Dongyan Zhao,
Rui Yan
Abstract:
With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribu…
▽ More
With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual map** upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual map** on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.
△ Less
Submitted 17 May, 2020;
originally announced May 2020.
-
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph
Authors:
Liying Cheng,
Dekun Wu,
Lidong Bing,
Yan Zhang,
Zhanming Jie,
Wei Lu,
Luo Si
Abstract:
Previous works on knowledge-to-text generation take as input a few RDF triples or key-value pairs conveying the knowledge of some entities to generate a natural language description. Existing datasets, such as WIKIBIO, WebNLG, and E2E, basically have a good alignment between an input triple/pair set and its output text. However, in practice, the input knowledge could be more than enough, since the…
▽ More
Previous works on knowledge-to-text generation take as input a few RDF triples or key-value pairs conveying the knowledge of some entities to generate a natural language description. Existing datasets, such as WIKIBIO, WebNLG, and E2E, basically have a good alignment between an input triple/pair set and its output text. However, in practice, the input knowledge could be more than enough, since the output description may only cover the most significant knowledge. In this paper, we introduce a large-scale and challenging dataset to facilitate the study of such a practical scenario in KG-to-text. Our dataset involves retrieving abundant knowledge of various types of main entities from a large knowledge graph (KG), which makes the current graph-to-sequence models severely suffer from the problems of information loss and parameter explosion while generating the descriptions. We address these challenges by proposing a multi-graph structure that is able to represent the original graph information more comprehensively. Furthermore, we also incorporate aggregation methods that learn to extract the rich graph information. Extensive experiments demonstrate the effectiveness of our model architecture.
△ Less
Submitted 26 October, 2020; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Salience Estimation with Multi-Attention Learning for Abstractive Text Summarization
Authors:
Piji Li,
Lidong Bing,
Zhongyu Wei,
Wai Lam
Abstract:
Attention mechanism plays a dominant role in the sequence generation models and has been used to improve the performance of machine translation and abstractive text summarization. Different from neural machine translation, in the task of text summarization, salience estimation for words, phrases or sentences is a critical component, since the output summary is a distillation of the input text. Alt…
▽ More
Attention mechanism plays a dominant role in the sequence generation models and has been used to improve the performance of machine translation and abstractive text summarization. Different from neural machine translation, in the task of text summarization, salience estimation for words, phrases or sentences is a critical component, since the output summary is a distillation of the input text. Although the typical attention mechanism can conduct text fragment selection from the input text conditioned on the decoder states, there is still a gap to conduct direct and effective salience detection. To bring back direct salience estimation for summarization with neural networks, we propose a Multi-Attention Learning framework which contains two new attention learning components for salience estimation: supervised attention learning and unsupervised attention learning. We regard the attention weights as the salience information, which means that the semantic units with large attention value will be more important. The context information obtained based on the estimated salience is incorporated with the typical attention mechanism in the decoder to conduct summary generation. Extensive experiments on some benchmark datasets in different languages demonstrate the effectiveness of the proposed framework for the task of abstractive summarization.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
GRET: Global Representation Enhanced Transformer
Authors:
Rongxiang Weng,
Haoran Wei,
Shujian Huang,
Heng Yu,
Lidong Bing,
Weihua Luo,
Jiajun Chen
Abstract:
Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However…
▽ More
Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
The Impact of Merging on The Origin of Kinematically Misaligned and Counter-rotating Galaxies in MaNGA
Authors:
Song-lin Li,
Yong Shi,
Dmitry Bizyaev,
Christopher Duckworth,
Ren-bin Yan,
Yan-mei Chen,
Long-ji Bing,
Jian-hang Chen,
Xiao-ling Yu,
Rogemar A. Riffel
Abstract:
Galaxy mergers and interactions are expected to play a significant role leading to offsets between gas and stellar motions in galaxies. Herein we crossmatch galaxies in MaNGA MPL-8 with the Dark Energy Spectroscopic Instrument (DESI) Legacy Surveys and identify 311 merging galaxies that have reliable measurements of the $Δ$PA, the difference between the stellar and gas kinematic position angles to…
▽ More
Galaxy mergers and interactions are expected to play a significant role leading to offsets between gas and stellar motions in galaxies. Herein we crossmatch galaxies in MaNGA MPL-8 with the Dark Energy Spectroscopic Instrument (DESI) Legacy Surveys and identify 311 merging galaxies that have reliable measurements of the $Δ$PA, the difference between the stellar and gas kinematic position angles to investigate the impacts of merging on gas-stellar rotation misalignments. We find that the merging fractions of misaligned galaxies (30$^\circ$ $\leqslant$ $Δ$PA $<$150$^\circ$) are higher than that of co-rotators ($Δ$PA $<$ 30$^\circ$) in both quiescent and star-forming galaxies. This result suggests that merging is one process to produce kinematic misalignments. The merging fraction of counter-rotators ($Δ$PA $\leqslant$ 150$^\circ$) is lower than that of misaligned galaxies in both quiescent and star-forming galaxies, while in the latter it is likely even lower than that of co-rotators. The orbital angular momentum transfer to the spins of stars and gas during merging and the tidal feature disappearance can lead to small merging fractions in counter-rotators. Numerous new stars that inherit angular momentum from gas after merging can further lower the merging fraction of star-forming counter-rotators.
△ Less
Submitted 17 November, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Knowing What, How and Why: A Near Complete Solution for Aspect-based Sentiment Analysis
Authors:
Haiyun Peng,
Lu Xu,
Lidong Bing,
Fei Huang,
Wei Lu,
Luo Si
Abstract:
Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a compl…
▽ More
Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from "Waiters are very friendly and the pasta is simply average" could be ('Waiters', positive, 'friendly'). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.
△ Less
Submitted 21 November, 2019; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Review-based Question Generation with Adaptive Instance Transfer and Augmentation
Authors:
Qian Yu,
Lidong Bing,
Qiong Zhang,
Wai Lam,
Luo Si
Abstract:
Online reviews provide rich information about products and service, while it remains inefficient for potential consumers to exploit the reviews for fulfilling their specific information need. We propose to explore question generation as a new way of exploiting review information. One major challenge of this task is the lack of review-question pairs for training a neural generation model. We propos…
▽ More
Online reviews provide rich information about products and service, while it remains inefficient for potential consumers to exploit the reviews for fulfilling their specific information need. We propose to explore question generation as a new way of exploiting review information. One major challenge of this task is the lack of review-question pairs for training a neural generation model. We propose an iterative learning framework for handling this challenge via adaptive transfer and augmentation of the training instances with the help of the available user-posed question-answer data. To capture the aspect characteristics in reviews, the augmentation and generation procedures incorporate related features extracted via unsupervised learning. Experiments on data from 10 categories of a popular E-commerce site demonstrate the effectiveness of the framework, as well as the usefulness of the new task.
△ Less
Submitted 3 May, 2020; v1 submitted 4 November, 2019;
originally announced November 2019.
-
Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning
Authors:
Zheng Li,
Xin Li,
Ying Wei,
Lidong Bing,
Yu Zhang,
Qiang Yang
Abstract:
Joint extraction of aspects and sentiments can be effectively formulated as a sequence labeling problem. However, such formulation hinders the effectiveness of supervised methods due to the lack of annotated sequence data in many domains. To address this issue, we firstly explore an unsupervised domain adaptation setting for this task. Prior work can only use common syntactic relations between asp…
▽ More
Joint extraction of aspects and sentiments can be effectively formulated as a sequence labeling problem. However, such formulation hinders the effectiveness of supervised methods due to the lack of annotated sequence data in many domains. To address this issue, we firstly explore an unsupervised domain adaptation setting for this task. Prior work can only use common syntactic relations between aspect and opinion words to bridge the domain gaps, which highly relies on external linguistic resources. To resolve it, we propose a novel Selective Adversarial Learning (SAL) method to align the inferred correlation vectors that automatically capture their latent relations. The SAL method can dynamically learn an alignment weight for each word such that more important words can possess higher alignment weights to achieve fine-grained (word-level) adaptation. Empirically, extensive experiments demonstrate the effectiveness of the proposed SAL method.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Improving Question Generation With to the Point Context
Authors:
**g**g Li,
Yifan Gao,
Lidong Bing,
Irwin King,
Michael R. Lyu
Abstract:
Question generation (QG) is the task of generating a question from a reference sentence and a specified answer within the sentence. A major challenge in QG is to identify answer-relevant context words to finish the declarative-to-interrogative sentence transformation. Existing sequence-to-sequence neural models achieve this goal by proximity-based answer position encoding under the intuition that…
▽ More
Question generation (QG) is the task of generating a question from a reference sentence and a specified answer within the sentence. A major challenge in QG is to identify answer-relevant context words to finish the declarative-to-interrogative sentence transformation. Existing sequence-to-sequence neural models achieve this goal by proximity-based answer position encoding under the intuition that neighboring words of answers are of high possibility to be answer-relevant. However, such intuition may not apply to all cases especially for sentences with complex answer-relevant relations. Consequently, the performance of these models drops sharply when the relative distance between the answer fragment and other non-stop sentence words that also appear in the ground truth question increases. To address this issue, we propose a method to jointly model the unstructured sentence and the structured answer-relevant relation (extracted from the sentence in advance) for question generation. Specifically, the structured answer-relevant relation acts as the to the point context and it thus naturally helps keep the generated question to the point, while the unstructured sentence provides the full information. Extensive experiments show that to the point context helps our question generation model achieve significant improvements on several automatic evaluation metrics. Furthermore, our model is capable of generating diverse questions for a sentence which conveys multiple relations of its answer fragment.
△ Less
Submitted 24 October, 2019; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Exploiting BERT for End-to-End Aspect-based Sentiment Analysis
Authors:
Xin Li,
Lidong Bing,
Wenxuan Zhang,
Wai Lam
Abstract:
In this paper, we investigate the modeling power of contextualized embeddings from pre-trained language models, e.g. BERT, on the E2E-ABSA task. Specifically, we build a series of simple yet insightful neural baselines to deal with E2E-ABSA. The experimental results show that even with a simple linear classification layer, our BERT-based architecture can outperform state-of-the-art works. Besides,…
▽ More
In this paper, we investigate the modeling power of contextualized embeddings from pre-trained language models, e.g. BERT, on the E2E-ABSA task. Specifically, we build a series of simple yet insightful neural baselines to deal with E2E-ABSA. The experimental results show that even with a simple linear classification layer, our BERT-based architecture can outperform state-of-the-art works. Besides, we also standardize the comparative study by consistently utilizing a hold-out validation dataset for model selection, which is largely ignored by previous works. Therefore, our work can serve as a BERT-based benchmark for E2E-ABSA.
△ Less
Submitted 4 October, 2019; v1 submitted 2 October, 2019;
originally announced October 2019.
-
Semi-supervised Text Style Transfer: Cross Projection in Latent Space
Authors:
Mingyue Shang,
Piji Li,
Zhenxin Fu,
Lidong Bing,
Dongyan Zhao,
Shuming Shi,
Rui Yan
Abstract:
Text style transfer task requires the model to transfer a sentence of one style to another style while retaining its original content meaning, which is a challenging problem that has long suffered from the shortage of parallel data. In this paper, we first propose a semi-supervised text style transfer model that combines the small-scale parallel data with the large-scale nonparallel data. With the…
▽ More
Text style transfer task requires the model to transfer a sentence of one style to another style while retaining its original content meaning, which is a challenging problem that has long suffered from the shortage of parallel data. In this paper, we first propose a semi-supervised text style transfer model that combines the small-scale parallel data with the large-scale nonparallel data. With these two types of training data, we introduce a projection function between the latent space of different styles and design two constraints to train it. We also introduce two other simple but effective semi-supervised methods to compare with. To evaluate the performance of the proposed methods, we build and release a novel style transfer dataset that alters sentences between the style of ancient Chinese poem and the modern Chinese.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Tackling Long-Tailed Relations and Uncommon Entities in Knowledge Graph Completion
Authors:
Zihao Wang,
Kwun ** Lai,
Piji Li,
Lidong Bing,
Wai Lam
Abstract:
For large-scale knowledge graphs (KGs), recent research has been focusing on the large proportion of infrequent relations which have been ignored by previous studies. For example few-shot learning paradigm for relations has been investigated. In this work, we further advocate that handling uncommon entities is inevitable when dealing with infrequent relations. Therefore, we propose a meta-learning…
▽ More
For large-scale knowledge graphs (KGs), recent research has been focusing on the large proportion of infrequent relations which have been ignored by previous studies. For example few-shot learning paradigm for relations has been investigated. In this work, we further advocate that handling uncommon entities is inevitable when dealing with infrequent relations. Therefore, we propose a meta-learning framework that aims at handling infrequent relations with few-shot learning and uncommon entities by using textual descriptions. We design a novel model to better extract key information from textual descriptions. Besides, we also develop a novel generative model in our framework to enhance the performance by generating extra triplets during the training stage. Experiments are conducted on two datasets from real-world KGs, and the results show that our framework outperforms previous methods when dealing with infrequent relations and their accompanying uncommon entities.
△ Less
Submitted 25 September, 2019;
originally announced September 2019.
-
Hierarchical Pointer Net Parsing
Authors:
Linlin Liu,
Xiang Lin,
Shafiq Joty,
Simeng Han,
Lidong Bing
Abstract:
Transition-based top-down parsing with pointer networks has achieved state-of-the-art results in multiple parsing tasks, while having a linear time complexity. However, the decoder of these parsers has a sequential structure, which does not yield the most appropriate inductive bias for deriving tree structures. In this paper, we propose hierarchical pointer network parsers, and apply them to depen…
▽ More
Transition-based top-down parsing with pointer networks has achieved state-of-the-art results in multiple parsing tasks, while having a linear time complexity. However, the decoder of these parsers has a sequential structure, which does not yield the most appropriate inductive bias for deriving tree structures. In this paper, we propose hierarchical pointer network parsers, and apply them to dependency and sentence-level discourse parsing tasks. Our results on standard benchmark datasets demonstrate the effectiveness of our approach, outperforming existing methods and setting a new state-of-the-art.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
The spatial extension of extended narrow line regions in MaNGA AGN
Authors:
Jianhang Chen,
Yong Shi,
Ross Dempsey,
David R. Law,
Yanmei Chen,
Renbin Yan,
Longji Bing,
Sandro B. Rembold,
Songlin Li,
Xiaoling Yu,
Rogemar A. Riffel,
Joe R. Brownstein,
Rogério Riffel
Abstract:
In this work, we revisit the size-luminosity relation of the extended narrow line regions (ENLRs) using a large sample of nearby active galactic nuclei (AGN) from the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey. The ENLRs ionized by the AGN are identified through the spatially resolved BPT diagram, which results in a sample of 152 AGN. By combining our AGN with the literatur…
▽ More
In this work, we revisit the size-luminosity relation of the extended narrow line regions (ENLRs) using a large sample of nearby active galactic nuclei (AGN) from the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey. The ENLRs ionized by the AGN are identified through the spatially resolved BPT diagram, which results in a sample of 152 AGN. By combining our AGN with the literature high-luminosity quasars, we found a tight log-linear relation between the size of the ENLR and the AGN [O III]λ5007Å luminosity over four orders of magnitude of the [O III] luminosity. The slope of this relation is 0.42 $\pm$ 0.02 which can be explained in terms of a distribution of clouds photoionized by the AGN. This relation also indicates the AGN have the potential to ionize and heat the gas clouds at a large distance from the nuclei without the aids of outflows and jets for the low-luminosity Seyferts.
△ Less
Submitted 22 September, 2019; v1 submitted 7 August, 2019;
originally announced August 2019.
-
What drives the velocity dispersion of ionized gas in star-forming galaxies?
Authors:
Xiaoling Yu,
Yong Shi,
Yanmei Chen,
David R. Law,
Dmitry Bizyaev,
Longji Bing,
Songlin Li,
Luwenjia Zhou,
Jianhang Chen,
Rogemar A. Riffel,
Rogério Riffel,
Kai Zhang,
Yongyun Chen,
Kaike Pan
Abstract:
We analyze the intrinsic velocity dispersion properties of 648 star-forming galaxies observed by the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey, to explore the relation of intrinsic gas velocity dispersions with star formation rates (SFRs), SFR surface densities ($\rm{Σ_{SFR}}$), stellar masses and stellar mass surface densities ($\rm{Σ_{*}}$). By combining with high z gala…
▽ More
We analyze the intrinsic velocity dispersion properties of 648 star-forming galaxies observed by the Map** Nearby Galaxies at Apache Point Observatory (MaNGA) survey, to explore the relation of intrinsic gas velocity dispersions with star formation rates (SFRs), SFR surface densities ($\rm{Σ_{SFR}}$), stellar masses and stellar mass surface densities ($\rm{Σ_{*}}$). By combining with high z galaxies, we found that there is a good correlation between the velocity dispersion and the SFR as well as $\rm{Σ_{SFR}}$. But the correlation between the velocity dispersion and the stellar mass as well as $\rm{Σ_{*}}$ is moderate. By comparing our results with predictions of theoretical models, we found that the energy feedback from star formation processes alone and the gravitational instability alone can not fully explain simultaneously the observed velocity-dispersion/SFR and velocity-dispersion/$\rm{Σ_{SFR}}$ relationships.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction
Authors:
Wang Chen,
Hou Pong Chan,
Piji Li,
Lidong Bing,
Irwin King
Abstract:
In this paper, we present a novel integrated approach for keyphrase generation (KG). Unlike previous works which are purely extractive or generative, we first propose a new multi-task learning framework that jointly learns an extractive model and a generative model. Besides extracting keyphrases, the output of the extractive model is also employed to rectify the copy probability distribution of th…
▽ More
In this paper, we present a novel integrated approach for keyphrase generation (KG). Unlike previous works which are purely extractive or generative, we first propose a new multi-task learning framework that jointly learns an extractive model and a generative model. Besides extracting keyphrases, the output of the extractive model is also employed to rectify the copy probability distribution of the generative model, such that the generative model can better identify important contents from the given document. Moreover, we retrieve similar documents with the given document from training data and use their associated keyphrases as external knowledge for the generative model to produce more accurate keyphrases. For further exploiting the power of extraction and retrieval, we propose a neural-based merging module to combine and re-rank the predicted keyphrases from the enhanced generative model, the extractive model, and the retrieved keyphrases. Experiments on the five KG benchmarks demonstrate that our integrated approach outperforms the state-of-the-art methods.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Persona-Aware Tips Generation
Authors:
Piji Li,
Zihao Wang,
Lidong Bing,
Wai Lam
Abstract:
Tips, as a compacted and concise form of reviews, were paid less attention by researchers. In this paper, we investigate the task of tips generation by considering the `persona' information which captures the intrinsic language style of the users or the different characteristics of the product items. In order to exploit the persona information, we propose a framework based on adversarial variation…
▽ More
Tips, as a compacted and concise form of reviews, were paid less attention by researchers. In this paper, we investigate the task of tips generation by considering the `persona' information which captures the intrinsic language style of the users or the different characteristics of the product items. In order to exploit the persona information, we propose a framework based on adversarial variational auto-encoders (aVAE) for persona modeling from the historical tips and reviews of users and items. The latent variables from aVAE are regarded as persona embeddings. Besides representing persona using the latent embeddings, we design a persona memory for storing the persona related words for users and items. Pointer Network is used to retrieve persona wordings from the memory when generating tips. Moreover, the persona embeddings are used as latent factors by a rating prediction component to predict the sentiment of a user over an item. Finally, the persona embeddings and the sentiment information are incorporated into a recurrent neural networks based tips generation component. Extensive experimental results are reported and discussed to elaborate the peculiarities of our framework.
△ Less
Submitted 13 March, 2019; v1 submitted 5 March, 2019;
originally announced March 2019.
-
Abstractive Text Summarization by Incorporating Reader Comments
Authors:
Shen Gao,
Xiuying Chen,
Piji Li,
Zhaochun Ren,
Lidong Bing,
Dongyan Zhao,
Rui Yan
Abstract:
In neural abstractive summarization field, conventional sequence-to-sequence based models often suffer from summarizing the wrong aspect of the document with respect to the main aspect. To tackle this problem, we propose the task of reader-aware abstractive summary generation, which utilizes the reader comments to help the model produce better summary about the main aspect. Unlike traditional abst…
▽ More
In neural abstractive summarization field, conventional sequence-to-sequence based models often suffer from summarizing the wrong aspect of the document with respect to the main aspect. To tackle this problem, we propose the task of reader-aware abstractive summary generation, which utilizes the reader comments to help the model produce better summary about the main aspect. Unlike traditional abstractive summarization task, reader-aware summarization confronts two main challenges: (1) Comments are informal and noisy; (2) jointly modeling the news document and the reader comments is challenging. To tackle the above challenges, we design an adversarial learning model named reader-aware summary generator (RASG), which consists of four components: (1) a sequence-to-sequence based summary generator; (2) a reader attention module capturing the reader focused aspects; (3) a supervisor modeling the semantic gap between the generated summary and reader focused aspects; (4) a goal tracker producing the goal for each generation step. The supervisor and the goal tacker are used to guide the training of our framework in an adversarial manner. Extensive experiments are conducted on our large-scale real-world text summarization dataset, and the results show that RASG achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations. The experimental results also demonstrate the effectiveness of each module in our framework. We release our large-scale dataset for further research.
△ Less
Submitted 13 December, 2018;
originally announced December 2018.
-
A Unified Model for Opinion Target Extraction and Target Sentiment Prediction
Authors:
Xin Li,
Lidong Bing,
Piji Li,
Wai Lam
Abstract:
Target-based sentiment analysis involves opinion target extraction and target sentiment classification. However, most of the existing works usually studied one of these two sub-tasks alone, which hinders their practical use. This paper aims to solve the complete task of target-based sentiment analysis in an end-to-end fashion, and presents a novel unified model which applies a unified tagging sche…
▽ More
Target-based sentiment analysis involves opinion target extraction and target sentiment classification. However, most of the existing works usually studied one of these two sub-tasks alone, which hinders their practical use. This paper aims to solve the complete task of target-based sentiment analysis in an end-to-end fashion, and presents a novel unified model which applies a unified tagging scheme. Our framework involves two stacked recurrent neural networks: The upper one predicts the unified tags to produce the final output results of the primary target-based sentiment analysis; The lower one performs an auxiliary target boundary prediction aiming at guiding the upper network to improve the performance of the primary task. To explore the inter-task dependency, we propose to explicitly model the constrained transitions from target boundaries to target sentiment polarities. We also propose to maintain the sentiment consistency within an opinion target via a gate mechanism which models the relation between the features for the current word and the previous word. We conduct extensive experiments on three benchmark datasets and our framework achieves consistently superior results.
△ Less
Submitted 11 February, 2019; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Mildly Suppressed Star Formation in Central Regions of MaNGA Seyfert Galaxies
Authors:
Longji Bing,
Yong Shi,
Yanmei Chen,
Sebastián F. Sánchez,
Roberto Maiolino,
Rogério Riffel,
Rogemar A. Riffel,
Dominika Wylezalek,
Dmitry Bizyaev,
Kaike Pan,
Niv Drory
Abstract:
Negative feedback from accretion onto super-massive black holes (SMBHs), that is to remove gas and suppress star formation in galaxies, has been widely suggested. However, for Seyfert galaxies which harbor less active, moderately accreting SMBHs in the local universe, the feedback capability of their black hole activity is elusive. We present spatially-resolved H$α$ measurements to trace ongoing s…
▽ More
Negative feedback from accretion onto super-massive black holes (SMBHs), that is to remove gas and suppress star formation in galaxies, has been widely suggested. However, for Seyfert galaxies which harbor less active, moderately accreting SMBHs in the local universe, the feedback capability of their black hole activity is elusive. We present spatially-resolved H$α$ measurements to trace ongoing star formation in Seyfert galaxies and compare their specific star formation rate with a sample of star-forming galaxies whose global galaxy properties are controlled to be the same as the Seyferts. From the comparison we find that the star formation rates within central kpc of Seyfert galaxies are mildly suppressed as compared to the matched normal star forming galaxies. This suggests that the feedback of moderate SMBH accretion could, to some extent, regulate the ongoing star formation in these intermediate to late type galaxies under secular evolution.
△ Less
Submitted 30 September, 2018;
originally announced October 2018.
-
Generating Distractors for Reading Comprehension Questions from Real Examinations
Authors:
Yifan Gao,
Lidong Bing,
Piji Li,
Irwin King,
Michael R. Lyu
Abstract:
We investigate the task of distractor generation for multiple choice reading comprehension questions from examinations. In contrast to all previous works, we do not aim at preparing words or short phrases distractors, instead, we endeavor to generate longer and semantic-rich distractors which are closer to distractors in real reading comprehension from examinations. Taking a reading comprehension…
▽ More
We investigate the task of distractor generation for multiple choice reading comprehension questions from examinations. In contrast to all previous works, we do not aim at preparing words or short phrases distractors, instead, we endeavor to generate longer and semantic-rich distractors which are closer to distractors in real reading comprehension from examinations. Taking a reading comprehension article, a pair of question and its correct option as input, our goal is to generate several distractors which are somehow related to the answer, consistent with the semantic context of the question and have some trace in the article. We propose a hierarchical encoder-decoder framework with static and dynamic attention mechanisms to tackle this task. Specifically, the dynamic attention can combine sentence-level and word-level attention varying at each recurrent time step to generate a more readable sequence. The static attention is to modulate the dynamic attention not to focus on question irrelevant sentences or sentences which contribute to the correct option. Our proposed framework outperforms several strong baselines on the first prepared distractor generation dataset of real reading comprehension questions. For human evaluation, compared with those distractors generated by baselines, our generated distractors are more functional to confuse the annotators.
△ Less
Submitted 18 December, 2018; v1 submitted 8 September, 2018;
originally announced September 2018.
-
An early-type galaxy with an inner star-forming disk
Authors:
Song-lin Li,
Yong shi,
Yan-mei Chen,
Martha Tabor,
Dmitry Bizyaev,
Jian-hang Chen,
Xiao-ling Yu,
Long-ji Bing
Abstract:
Early-type galaxies (ETGs) are composed of two distinct populations: high-mass and low-mass, which are likely to be built via gas-poor merging and gas-rich merging/accretion, respectively. However, it is difficult to directly associate low-mass ETGs with gas-rich processes, because currently they are gas poor with no signs of ongoing star formation. We report a discovery of an ETG (SDSS J142055.01…
▽ More
Early-type galaxies (ETGs) are composed of two distinct populations: high-mass and low-mass, which are likely to be built via gas-poor merging and gas-rich merging/accretion, respectively. However, it is difficult to directly associate low-mass ETGs with gas-rich processes, because currently they are gas poor with no signs of ongoing star formation. We report a discovery of an ETG (SDSS J142055.01+400715.7) with Mstellar=10^10 Msun that offers direct evidence for gas-rich merging as the origin of low-mass ETGs. The integrated properties of the galaxy are consistent with a typical low-mass ETG, but the outer and inner regions show distinct dispersion- and rotation-dominated kinematics, respectively. There are some tidal features surrounding the galaxy. These two facts suggest very recent galaxy merging. Furthermore, the inner disk harbors on-going star formation, indicating the merging to be gas rich. This type of galaxy is rare but it may be a demonstration of the role the transient phase of gas-rich merging plays in making a low-mass ETG.
△ Less
Submitted 7 August, 2018; v1 submitted 6 August, 2018;
originally announced August 2018.
-
Difficulty Controllable Generation of Reading Comprehension Questions
Authors:
Yifan Gao,
Lidong Bing,
Wang Chen,
Michael R. Lyu,
Irwin King
Abstract:
We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i.e., answers) that we want to ask questions about, a DQG method needs to generate questions each o…
▽ More
We investigate the difficulty levels of questions in reading comprehension datasets such as SQuAD, and propose a new question generation setting, named Difficulty-controllable Question Generation (DQG). Taking as input a sentence in the reading comprehension paragraph and some of its text fragments (i.e., answers) that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels by exploring a few important intuitions. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that the question generated by our framework not only have better quality under the metrics like BLEU, but also comply with the specified difficulty labels.
△ Less
Submitted 30 May, 2019; v1 submitted 10 July, 2018;
originally announced July 2018.
-
Learning Domain-Sensitive and Sentiment-Aware Word Embeddings
Authors:
Bei Shi,
Zihao Fu,
Lidong Bing,
Wai Lam
Abstract:
Word embeddings have been widely used in sentiment classification because of their efficacy for semantic representations of words. Given reviews from different domains, some existing methods for word embeddings exploit sentiment information, but they cannot produce domain-sensitive embeddings. On the other hand, some other existing methods can generate domain-sensitive word embeddings, but they ca…
▽ More
Word embeddings have been widely used in sentiment classification because of their efficacy for semantic representations of words. Given reviews from different domains, some existing methods for word embeddings exploit sentiment information, but they cannot produce domain-sensitive embeddings. On the other hand, some other existing methods can generate domain-sensitive word embeddings, but they cannot distinguish words with similar contexts but opposite sentiment polarity. We propose a new method for learning domain-sensitive and sentiment-aware embeddings that simultaneously capture the information of sentiment semantics and domain sensitivity of individual words. Our method can automatically determine and produce domain-common embeddings and domain-specific embeddings. The differentiation of domain-common and domain-specific words enables the advantage of data augmentation of common semantics from multiple domains and capture the varied semantics of specific words from different domains at the same time. Experimental results show that our model provides an effective way to learn domain-sensitive and sentiment-aware word embeddings which benefit sentiment classification at both sentence level and lexicon term level.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Transformation Networks for Target-Oriented Sentiment Classification
Authors:
Xin Li,
Lidong Bing,
Wai Lam,
Bei Shi
Abstract:
Target-oriented sentiment classification aims at classifying sentiment polarities over individual opinion targets in a sentence. RNN with attention seems a good fit for the characteristics of this task, and indeed it achieves the state-of-the-art performance. After re-examining the drawbacks of attention mechanism and the obstacles that block CNN to perform well in this classification task, we pro…
▽ More
Target-oriented sentiment classification aims at classifying sentiment polarities over individual opinion targets in a sentence. RNN with attention seems a good fit for the characteristics of this task, and indeed it achieves the state-of-the-art performance. After re-examining the drawbacks of attention mechanism and the obstacles that block CNN to perform well in this classification task, we propose a new model to overcome these issues. Instead of attention, our model employs a CNN layer to extract salient features from the transformed word representations originated from a bi-directional RNN layer. Between the two layers, we propose a component to generate target-specific representations of words in the sentence, meanwhile incorporate a mechanism for preserving the original contextual information from the RNN layer. Experiments show that our model achieves a new state-of-the-art performance on a few benchmarks.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
Aspect Term Extraction with History Attention and Selective Transformation
Authors:
Xin Li,
Lidong Bing,
Piji Li,
Wai Lam,
Zhimou Yang
Abstract:
Aspect Term Extraction (ATE), a key sub-task in Aspect-Based Sentiment Analysis, aims to extract explicit aspect expressions from online user reviews. We present a new framework for tackling ATE. It can exploit two useful clues, namely opinion summary and aspect detection history. Opinion summary is distilled from the whole input sentence, conditioned on each current token for aspect prediction, a…
▽ More
Aspect Term Extraction (ATE), a key sub-task in Aspect-Based Sentiment Analysis, aims to extract explicit aspect expressions from online user reviews. We present a new framework for tackling ATE. It can exploit two useful clues, namely opinion summary and aspect detection history. Opinion summary is distilled from the whole input sentence, conditioned on each current token for aspect prediction, and thus the tailor-made summary can help aspect prediction on this token. Another clue is the information of aspect detection history, and it is distilled from the previous aspect predictions so as to leverage the coordinate structure and tagging schema constraints to upgrade the aspect prediction. Experimental results over four benchmark datasets clearly demonstrate that our framework can outperform all state-of-the-art methods.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.