Search | arXiv e-print repository

arXiv:2006.03239 [pdf, other]

Think out of the package: Recommending package types for e-commerce shipments

Authors: Karthik S. Gurumoorthy, Subhajit Sanyal, Vineet Chaoji

Abstract: Multiple product attributes like dimensions, weight, fragility, liquid content etc. determine the package type used by e-commerce companies to ship products. Sub-optimal package types lead to damaged shipments, incurring huge damage related costs and adversely impacting the company's reputation for safe delivery. Items can be shipped in more protective packages to reduce damage costs, however this… ▽ More Multiple product attributes like dimensions, weight, fragility, liquid content etc. determine the package type used by e-commerce companies to ship products. Sub-optimal package types lead to damaged shipments, incurring huge damage related costs and adversely impacting the company's reputation for safe delivery. Items can be shipped in more protective packages to reduce damage costs, however this increases the shipment costs due to expensive packaging and higher transportation costs. In this work, we propose a multi-stage approach that trades-off between shipment and damage costs for each product, and accurately assigns the optimal package type using a scalable, computationally efficient linear time algorithm. A simple binary search algorithm is presented to find the hyper-parameter that balances between the shipment and damage costs. Our approach when applied to choosing package type for Amazon shipments, leads to significant cost savings of tens of millions of dollars in emerging marketplaces, by decreasing both the overall shipment cost and the number of in-transit damages. Our algorithm is live and deployed in the production system where, package types for more than 130,000 products have been modified based on the model's recommendation, realizing a reduction in damage rate of 24%. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: Accepted in ECML-PKDD 2020

arXiv:1905.06246 [pdf, other]

doi 10.1145/3292500.3330678

Detection of Review Abuse via Semi-Supervised Binary Multi-Target Tensor Decomposition

Authors: Anil R. Yelundur, Vineet Chaoji, Bamdev Mishra

Abstract: Product reviews and ratings on e-commerce websites provide customers with detailed insights about various aspects of the product such as quality, usefulness, etc. Since they influence customers' buying decisions, product reviews have become a fertile ground for abuse by sellers (colluding with reviewers) to promote their own products or to tarnish the reputation of competitor's products. In this p… ▽ More Product reviews and ratings on e-commerce websites provide customers with detailed insights about various aspects of the product such as quality, usefulness, etc. Since they influence customers' buying decisions, product reviews have become a fertile ground for abuse by sellers (colluding with reviewers) to promote their own products or to tarnish the reputation of competitor's products. In this paper, our focus is on detecting such abusive entities (both sellers and reviewers) by applying tensor decomposition on the product reviews data. While tensor decomposition is mostly unsupervised, we formulate our problem as a semi-supervised binary multi-target tensor decomposition, to take advantage of currently known abusive entities. We empirically show that our multi-target semi-supervised model achieves higher precision and recall in detecting abusive entities as compared to unsupervised techniques. Finally, we show that our proposed stochastic partial natural gradient inference for our model empirically achieves faster convergence than stochastic gradient and Online-EM with sufficient statistics. △ Less

Submitted 23 May, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: Accepted to the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2019. Contains supplementary material. arXiv admin note: text overlap with arXiv:1804.03836

arXiv:1709.08761 [pdf]

Image similarity using Deep CNN and Curriculum Learning

Authors: Srikar Appalaraju, Vineet Chaoji

Abstract: Image similarity involves fetching similar looking images given a reference image. Our solution called SimNet, is a deep siamese network which is trained on pairs of positive and negative images using a novel online pair mining strategy inspired by Curriculum learning. We also created a multi-scale CNN, where the final image embedding is a joint representation of top as well as lower layer embeddi… ▽ More Image similarity involves fetching similar looking images given a reference image. Our solution called SimNet, is a deep siamese network which is trained on pairs of positive and negative images using a novel online pair mining strategy inspired by Curriculum learning. We also created a multi-scale CNN, where the final image embedding is a joint representation of top as well as lower layer embedding's. We go on to show that this multi-scale siamese network is better at capturing fine grained image similarities than traditional CNN's. △ Less

Submitted 13 July, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

Comments: 9 pages, 6 figures, GHCI 17 conference

arXiv:1301.0977 [pdf, ps, other]

DAGGER: A Scalable Index for Reachability Queries in Large Dynamic Graphs

Authors: Hilmi Yildirim, Vineet Chaoji, Mohammed J. Zaki

Abstract: With the ubiquity of large-scale graph data in a variety of application domains, querying them effectively is a challenge. In particular, reachability queries are becoming increasingly important, especially for containment, subsumption, and connectivity checks. Whereas many methods have been proposed for static graph reachability, many real-world graphs are constantly evolving, which calls for dyn… ▽ More With the ubiquity of large-scale graph data in a variety of application domains, querying them effectively is a challenge. In particular, reachability queries are becoming increasingly important, especially for containment, subsumption, and connectivity checks. Whereas many methods have been proposed for static graph reachability, many real-world graphs are constantly evolving, which calls for dynamic indexing. In this paper, we present a fully dynamic reachability index over dynamic graphs. Our method, called DAGGER, is a light-weight index based on interval labeling, that scales to million node graphs and beyond. Our extensive experimental evaluation on real-world and synthetic graphs confirms its effectiveness over baseline methods. △ Less

Submitted 6 January, 2013; originally announced January 2013.

Comments: 11 pages, 7 figures, 2 tables

ACM Class: H.3.3

arXiv:1203.2886 [pdf, ps, other]

BitPath -- Label Order Constrained Reachability Queries over Large Graphs

Authors: Medha Atre, Vineet Chaoji, Mohammed J. Zaki

Abstract: In this paper we focus on the following constrained reachability problem over edge-labeled graphs like RDF -- "given source node x, destination node y, and a sequence of edge labels (a, b, c, d), is there a path between the two nodes such that the edge labels on the path satisfy a regular expression "*a.*b.*c.*d.*". A "*" before "a" allows any other edge label to appear on the path before edge "a"… ▽ More In this paper we focus on the following constrained reachability problem over edge-labeled graphs like RDF -- "given source node x, destination node y, and a sequence of edge labels (a, b, c, d), is there a path between the two nodes such that the edge labels on the path satisfy a regular expression "*a.*b.*c.*d.*". A "*" before "a" allows any other edge label to appear on the path before edge "a". "a.*" forces at least one edge with label "a". ".*" after "a" allows zero or more edge labels after "a" and before "b". Our query processing algorithm uses simple divide-and-conquer and greedy pruning procedures to limit the search space. However, our graph indexing technique -- based on "compressed bit-vectors" -- allows indexing large graphs which otherwise would have been infeasible. We have evaluated our approach on graphs with more than 22 million edges and 6 million nodes -- much larger compared to the datasets used in the contemporary work on path queries. △ Less

Submitted 13 March, 2012; originally announced March 2012.

Report number: RPI-CS 12-02 ACM Class: H.2.4; E.1; E.2

Showing 1–5 of 5 results for author: Chaoji, V