Search | arXiv e-print repository

From cryptomarkets to the surface web: Scouting eBay for counterfeits

Authors: Felix Soldner, Fabian Plum, Bennett Kleinberg, Shane D Johnson

Abstract: Detecting counterfeits on online marketplaces is challenging, and current methods struggle with the volume of sales on platforms like eBay, while cryptomarkets openly sell counterfeits. Leveraging information from 453 cryptomarket counterfeits, we automated a search for corresponding products on eBay, utilizing image and text similarity metrics. We collected data twice over 4-months to analyze cha… ▽ More Detecting counterfeits on online marketplaces is challenging, and current methods struggle with the volume of sales on platforms like eBay, while cryptomarkets openly sell counterfeits. Leveraging information from 453 cryptomarket counterfeits, we automated a search for corresponding products on eBay, utilizing image and text similarity metrics. We collected data twice over 4-months to analyze changes with an average of 159 eBay products per cryptomarket item, totaling 134k products. We found identical products, which would warrant further investigation as to whether they are counterfeits. Results indicate increasing difficulty finding similar products over time, moderated by product type and origin. Future improved versions of the current system could be used to examine possible connections between cryptomarket and surface web listings more closely and could hold practical value in supporting the detection of counterfeits on the surface web. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: pre-print

arXiv:2306.14219 [pdf, other]

Total Error Sheets for Datasets (TES-D) -- A Critical Guide to Documenting Online Platform Datasets

Authors: Leon Fröhling, Indira Sen, Felix Soldner, Leonie Steinbrinker, Maria Zens, Katrin Weller

Abstract: This paper proposes a template for documenting datasets that have been collected from online platforms for research purposes. The template should help to critically reflect on data quality and increase transparency in research fields that make use of online platform data. The paper describes our motivation, outlines the procedure for develo** a specific documentation template that we refer to as… ▽ More This paper proposes a template for documenting datasets that have been collected from online platforms for research purposes. The template should help to critically reflect on data quality and increase transparency in research fields that make use of online platform data. The paper describes our motivation, outlines the procedure for develo** a specific documentation template that we refer to as TES-D (Total Error Sheets for Datasets) and has the current version of the template, guiding questions and a manual attached as supplementary material. The TES-D approach builds upon prior work in designing error frameworks for data from online platforms, namely the Total Error Framework for digital traces of human behavior on online platforms (TED-On, https://doi.org/10.1093/poq/nfab018). △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: 21 pages, 2 figures

arXiv:2212.02945 [pdf]

doi 10.1186/s40163-023-00195-2

Counterfeits on Darknet Markets: A measurement between Jan-2014 and Sep-2015

Authors: Felix Soldner, Bennett Kleinberg, Shane D Johnson

Abstract: Counterfeits harm consumers, governments, and intellectual property holders. They accounted for 3.3% of worldwide trades in 2016, having an estimated value of $509 billion in the same year. While estimations are mostly based on border seizures, we examined openly labeled counterfeits on darknet markets, which allowed us to gather and analyze information from a different perspective. Here, we analy… ▽ More Counterfeits harm consumers, governments, and intellectual property holders. They accounted for 3.3% of worldwide trades in 2016, having an estimated value of $509 billion in the same year. While estimations are mostly based on border seizures, we examined openly labeled counterfeits on darknet markets, which allowed us to gather and analyze information from a different perspective. Here, we analyzed data from 11 darknet markets for the period Jan-2014 and Sep-2015. The findings suggest that darknet markets harbor similar counterfeit product types as found in seizures but that the share of watches is higher and lower for electronics, clothes, shoes, and Tobacco on darknet markets. Also, darknet market counterfeits seem to have similar ship** origins as seized goods, with some exceptions, such as a relatively high share (5%) of dark market counterfeits originating from the US. Lastly, counterfeits on dark markets tend to have a relatively low price and sales volume. However, based on preliminary estimations, the original products on the surface web seem to be worth a multiple of the prices of the counterfeit counterparts on darknet markets. Gathering insights about counterfeits from darknet markets can be valuable for businesses and authorities and be cost-effective compared to border seizures. Thus, monitoring darknet markets can help us understand the counterfeit landscape better. △ Less

Submitted 24 October, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: This paper is a pre-print

arXiv:2210.03080 [pdf, other]

Explainable Verbal Deception Detection using Transformers

Authors: Loukas Ilias, Felix Soldner, Bennett Kleinberg

Abstract: People are regularly confronted with potentially deceptive statements (e.g., fake news, misleading product reviews, or lies about activities). Only few works on automated text-based deception detection have exploited the potential of deep learning approaches. A critique of deep-learning methods is their lack of interpretability, preventing us from understanding the underlying (linguistic) mechanis… ▽ More People are regularly confronted with potentially deceptive statements (e.g., fake news, misleading product reviews, or lies about activities). Only few works on automated text-based deception detection have exploited the potential of deep learning approaches. A critique of deep-learning methods is their lack of interpretability, preventing us from understanding the underlying (linguistic) mechanisms involved in deception. However, recent advancements have made it possible to explain some aspects of such models. This paper proposes and evaluates six deep-learning models, including combinations of BERT (and RoBERTa), MultiHead Attention, co-attentions, and transformers. To understand how the models reach their decisions, we then examine the model's predictions with LIME. We then zoom in on vocabulary uniqueness and the correlation of LIWC categories with the outcome class (truthful vs deceptive). The findings suggest that our transformer-based models can enhance automated deception detection performances (+2.11% in accuracy) and show significant differences pertinent to the usage of LIWC features in truthful and deceptive statements. △ Less

Submitted 6 October, 2022; originally announced October 2022.

arXiv:2110.15130 [pdf]

doi 10.1371/journal.pone.0277869

Confounds and Overestimations in Fake Review Detection: Experimentally Controlling for Product-Ownership and Data-Origin

Authors: Felix Soldner, Bennett Kleinberg, Shane Johnson

Abstract: The popularity of online shop** is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential… ▽ More The popularity of online shop** is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectablebut reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded withproduct-ownership and data-origin combined (87.78 - 88.12%), suggesting overestimations of thetrue performance in other work. These findings are moderated by review polarity. △ Less

Submitted 8 December, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Showing 1–5 of 5 results for author: Soldner, F