-
Deep Investigation of Cross-Language Plagiarism Detection Methods
Abstract: This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw r… ▽ More
Submitted 24 May, 2017; originally announced May 2017.
Comments: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 2017
-
arXiv:1704.01346 [pdf, ps, other]
CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
Abstract: We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked… ▽ More
Submitted 5 April, 2017; originally announced April 2017.
-
UsingWord Embedding for Cross-Language Plagiarism Detection
Abstract: This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an over… ▽ More
Submitted 10 February, 2017; originally announced February 2017.
Comments: Accepted to EACL 2017 (short)