-
From cryptomarkets to the surface web: Scouting eBay for counterfeits
Authors:
Felix Soldner,
Fabian Plum,
Bennett Kleinberg,
Shane D Johnson
Abstract:
Detecting counterfeits on online marketplaces is challenging, and current methods struggle with the volume of sales on platforms like eBay, while cryptomarkets openly sell counterfeits. Leveraging information from 453 cryptomarket counterfeits, we automated a search for corresponding products on eBay, utilizing image and text similarity metrics. We collected data twice over 4-months to analyze cha…
▽ More
Detecting counterfeits on online marketplaces is challenging, and current methods struggle with the volume of sales on platforms like eBay, while cryptomarkets openly sell counterfeits. Leveraging information from 453 cryptomarket counterfeits, we automated a search for corresponding products on eBay, utilizing image and text similarity metrics. We collected data twice over 4-months to analyze changes with an average of 159 eBay products per cryptomarket item, totaling 134k products. We found identical products, which would warrant further investigation as to whether they are counterfeits. Results indicate increasing difficulty finding similar products over time, moderated by product type and origin. Future improved versions of the current system could be used to examine possible connections between cryptomarket and surface web listings more closely and could hold practical value in supporting the detection of counterfeits on the surface web.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Total Error Sheets for Datasets (TES-D) -- A Critical Guide to Documenting Online Platform Datasets
Authors:
Leon Fröhling,
Indira Sen,
Felix Soldner,
Leonie Steinbrinker,
Maria Zens,
Katrin Weller
Abstract:
This paper proposes a template for documenting datasets that have been collected from online platforms for research purposes. The template should help to critically reflect on data quality and increase transparency in research fields that make use of online platform data. The paper describes our motivation, outlines the procedure for develo** a specific documentation template that we refer to as…
▽ More
This paper proposes a template for documenting datasets that have been collected from online platforms for research purposes. The template should help to critically reflect on data quality and increase transparency in research fields that make use of online platform data. The paper describes our motivation, outlines the procedure for develo** a specific documentation template that we refer to as TES-D (Total Error Sheets for Datasets) and has the current version of the template, guiding questions and a manual attached as supplementary material. The TES-D approach builds upon prior work in designing error frameworks for data from online platforms, namely the Total Error Framework for digital traces of human behavior on online platforms (TED-On, https://doi.org/10.1093/poq/nfab018).
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
Counterfeits on Darknet Markets: A measurement between Jan-2014 and Sep-2015
Authors:
Felix Soldner,
Bennett Kleinberg,
Shane D Johnson
Abstract:
Counterfeits harm consumers, governments, and intellectual property holders. They accounted for 3.3% of worldwide trades in 2016, having an estimated value of $509 billion in the same year. While estimations are mostly based on border seizures, we examined openly labeled counterfeits on darknet markets, which allowed us to gather and analyze information from a different perspective. Here, we analy…
▽ More
Counterfeits harm consumers, governments, and intellectual property holders. They accounted for 3.3% of worldwide trades in 2016, having an estimated value of $509 billion in the same year. While estimations are mostly based on border seizures, we examined openly labeled counterfeits on darknet markets, which allowed us to gather and analyze information from a different perspective. Here, we analyzed data from 11 darknet markets for the period Jan-2014 and Sep-2015. The findings suggest that darknet markets harbor similar counterfeit product types as found in seizures but that the share of watches is higher and lower for electronics, clothes, shoes, and Tobacco on darknet markets. Also, darknet market counterfeits seem to have similar ship** origins as seized goods, with some exceptions, such as a relatively high share (5%) of dark market counterfeits originating from the US. Lastly, counterfeits on dark markets tend to have a relatively low price and sales volume. However, based on preliminary estimations, the original products on the surface web seem to be worth a multiple of the prices of the counterfeit counterparts on darknet markets. Gathering insights about counterfeits from darknet markets can be valuable for businesses and authorities and be cost-effective compared to border seizures. Thus, monitoring darknet markets can help us understand the counterfeit landscape better.
△ Less
Submitted 24 October, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Explainable Verbal Deception Detection using Transformers
Authors:
Loukas Ilias,
Felix Soldner,
Bennett Kleinberg
Abstract:
People are regularly confronted with potentially deceptive statements (e.g., fake news, misleading product reviews, or lies about activities). Only few works on automated text-based deception detection have exploited the potential of deep learning approaches. A critique of deep-learning methods is their lack of interpretability, preventing us from understanding the underlying (linguistic) mechanis…
▽ More
People are regularly confronted with potentially deceptive statements (e.g., fake news, misleading product reviews, or lies about activities). Only few works on automated text-based deception detection have exploited the potential of deep learning approaches. A critique of deep-learning methods is their lack of interpretability, preventing us from understanding the underlying (linguistic) mechanisms involved in deception. However, recent advancements have made it possible to explain some aspects of such models. This paper proposes and evaluates six deep-learning models, including combinations of BERT (and RoBERTa), MultiHead Attention, co-attentions, and transformers. To understand how the models reach their decisions, we then examine the model's predictions with LIME. We then zoom in on vocabulary uniqueness and the correlation of LIWC categories with the outcome class (truthful vs deceptive). The findings suggest that our transformer-based models can enhance automated deception detection performances (+2.11% in accuracy) and show significant differences pertinent to the usage of LIWC features in truthful and deceptive statements.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Confounds and Overestimations in Fake Review Detection: Experimentally Controlling for Product-Ownership and Data-Origin
Authors:
Felix Soldner,
Bennett Kleinberg,
Shane Johnson
Abstract:
The popularity of online shop** is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential…
▽ More
The popularity of online shop** is steadily increasing. At the same time, fake product reviewsare published widely and have the potential to affect consumer purchasing behavior. In response,previous work has developed automated methods for the detection of deceptive product reviews.However, studies vary considerably in terms of classification performance, and many use data thatcontain potential confounds, which makes it difficult to determine their validity. Two possibleconfounds are data-origin (i.e., the dataset is composed of more than one source) and productownership (i.e., reviews written by individuals who own or do not own the reviewed product). Inthe present study, we investigate the effect of both confounds for fake review detection. Using anexperimental design, we manipulate data-origin, product ownership, review polarity, and veracity.Supervised learning analysis suggests that review veracity (60.26 - 69.87%) is somewhat detectablebut reviews additionally confounded with product-ownership (66.19 - 74.17%), or with data-origin(84.44 - 86.94%) are easier to classify. Review veracity is most easily classified if confounded withproduct-ownership and data-origin combined (87.78 - 88.12%), suggesting overestimations of thetrue performance in other work. These findings are moderated by review polarity.
△ Less
Submitted 8 December, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.