Misleading Metadata Detection on YouTube
Authors:
Priyank Palod,
Ayush Patwari,
Sudhanshu Bahety,
Saurabh Bagchi,
Pawan Goyal
Abstract:
YouTube is the leading social media platform for sharing videos. As a result, it is plagued with misleading content that includes staged videos presented as real footages from an incident, videos with misrepresented context and videos where audio/video content is morphed. We tackle the problem of detecting such misleading videos as a supervised classification task. We develop UCNet - a deep networ…
▽ More
YouTube is the leading social media platform for sharing videos. As a result, it is plagued with misleading content that includes staged videos presented as real footages from an incident, videos with misrepresented context and videos where audio/video content is morphed. We tackle the problem of detecting such misleading videos as a supervised classification task. We develop UCNet - a deep network to detect fake videos and perform our experiments on two datasets - VAVD created by us and publicly available FVC [8]. We achieve a macro averaged F-score of 0.82 while training and testing on a 70:30 split of FVC, while the baseline model scores 0.36. We find that the proposed model generalizes well when trained on one dataset and tested on the other.
△ Less
Submitted 25 January, 2019;
originally announced January 2019.
OCR++: A Robust Framework For Information Extraction from Scholarly Articles
Authors:
Mayank Singh,
Barnopriyo Barua,
Priyank Palod,
Manvi Garg,
Sidhartha Satapathy,
Samuel Bushi,
Kumar Ayush,
Krishna Sai Rohith,
Tulasi Gamidi,
Pawan Goyal,
Animesh Mukherjee
Abstract:
This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written in…
▽ More
This paper proposes OCR++, an open-source framework designed for a variety of information extraction tasks from scholarly articles including metadata (title, author names, affiliation and e-mail), structure (section headings and body text, table and figure headings, URLs and footnotes) and bibliography (citation instances and references). We analyze a diverse set of scientific articles written in English language to understand generic writing patterns and formulate rules to develop this hybrid framework. Extensive evaluations show that the proposed framework outperforms the existing state-of-the-art tools with huge margin in structural information extraction along with improved performance in metadata and bibliography extraction tasks, both in terms of accuracy (around 50% improvement) and processing time (around 52% improvement). A user experience study conducted with the help of 30 researchers reveals that the researchers found this system to be very helpful. As an additional objective, we discuss two novel use cases including automatically extracting links to public datasets from the proceedings, which would further accelerate the advancement in digital libraries. The result of the framework can be exported as a whole into structured TEI-encoded documents. Our framework is accessible online at http://cnergres.iitkgp.ac.in/OCR++/home/.
△ Less
Submitted 23 September, 2016; v1 submitted 21 September, 2016;
originally announced September 2016.