Search | arXiv e-print repository

arXiv:2311.02083 [pdf, other]

MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language

Authors: Conghao Tom Shen, Violet Yao, Yixin Liu

Abstract: Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address this, we present MaRU (Manga Retrieval and Understanding), a multi-staged system that connects vision a… ▽ More Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address this, we present MaRU (Manga Retrieval and Understanding), a multi-staged system that connects vision and language to facilitate efficient search of both dialogues and scenes within Manga frames. The architecture of MaRU integrates an object detection model for identifying text and frame bounding boxes, a Vision Encoder-Decoder model for text recognition, a text encoder for embedding text, and a vision-text encoder that merges textual and visual information into a unified embedding space for scene retrieval. Rigorous evaluations reveal that MaRU excels in end-to-end dialogue retrieval and exhibits promising results for scene retrieval. △ Less

Submitted 22 October, 2023; originally announced November 2023.

arXiv:2309.01812 [pdf, other]

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Authors: Ruth Dannenfelser, Jeffrey Zhong, Ran Zhang, Vicky Yao

Abstract: Many of the most commonly explored natural language processing (NLP) information extraction tasks can be thought of as evaluations of declarative knowledge, or fact-based information extraction. Procedural knowledge extraction, i.e., breaking down a described process into a series of steps, has received much less attention, perhaps in part due to the lack of structured datasets that capture the kn… ▽ More Many of the most commonly explored natural language processing (NLP) information extraction tasks can be thought of as evaluations of declarative knowledge, or fact-based information extraction. Procedural knowledge extraction, i.e., breaking down a described process into a series of steps, has received much less attention, perhaps in part due to the lack of structured datasets that capture the knowledge extraction process from end-to-end. To address this unmet need, we present FlaMBé (Flow annotations for Multiverse Biological entities), a collection of expert-curated datasets across a series of complementary tasks that capture procedural knowledge in biomedical texts. This dataset is inspired by the observation that one ubiquitous source of procedural knowledge that is described as unstructured text is within academic papers describing their methodology. The workflows annotated in FlaMBé are from texts in the burgeoning field of single cell research, a research area that has become notorious for the number of software tools and complexity of workflows used. Additionally, FlaMBé provides, to our knowledge, the largest manually curated named entity recognition (NER) and disambiguation (NED) datasets for tissue/cell type, a fundamental biological entity that is critical for knowledge extraction in the biomedical research domain. Beyond providing a valuable dataset to enable further development of NLP models for procedural knowledge extraction, automating the process of workflow mining also has important implications for advancing reproducibility in biomedical research. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Submitted to NeurIPS 2023 Datasets and Benchmarks Track

arXiv:2306.06284 [pdf, other]

doi 10.1145/3587819.3592542

Everybody Compose: Deep Beats To Music

Authors: Conghao Shen, Violet Z. Yao, Yixin Liu

Abstract: This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated… ▽ More This project presents a deep learning approach to generate monophonic melodies based on input beats, allowing even amateurs to create their own music compositions. Three effective methods - LSTM with Full Attention, LSTM with Local Attention, and Transformer with Relative Position Representation - are proposed for this novel task, providing great variation, harmony, and structure in the generated music. This project allows anyone to compose their own music by tap** their keyboards or ``recoloring'' beat sequences from existing works. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted MMSys '23

Journal ref: Proceedings of the 14th Conference on ACM Multimedia Systems (2023)

arXiv:2305.14292 [pdf, other]

doi 10.18653/v1/2023.findings-emnlp.157

WikiChat: Stop** the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Authors: Sina J. Semnani, Violet Z. Yao, Heidi C. Zhang, Monica S. Lam

Abstract: This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engagi… ▽ More This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment. Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM. WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments. △ Less

Submitted 27 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: Findings of EMNLP 2023

arXiv:2010.02164 [pdf, other]

doi 10.18653/v1/2020.emnlp-main.366

A Streaming Approach For Efficient Batched Beam Search

Authors: Kevin Yang, Violet Yao, John DeNero, Dan Klein

Abstract: We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically "refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method dec… ▽ More We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically "refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71% compared to a fixed-width beam search baseline and 17% compared to a variable-width baseline, while matching baselines' BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing. △ Less

Submitted 15 August, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:1905.05707 [pdf]

Limited Resource Optimal Distribution Algorithm Based on Game Iteration Method

Authors: Vilisov V. Ya

Abstract: The article provides a solution algorithm for the linear programming problem (LPP) with the latter being presented as an antagonistic matrix game so the game's further solution is based on the iterative method. The algorithm is presented as a computer program. Having applied necessary accuracy, the author has researched the solution assessment convergence rate in relation to the actual value. Prog… ▽ More The article provides a solution algorithm for the linear programming problem (LPP) with the latter being presented as an antagonistic matrix game so the game's further solution is based on the iterative method. The algorithm is presented as a computer program. Having applied necessary accuracy, the author has researched the solution assessment convergence rate in relation to the actual value. Program implementation demonstrates high rate of the LPP solution receipt, with the acceptable accuracy being fractions or unities. It allows using the algorithm in the integrated systems for the purpose of their optimal control. △ Less

Submitted 11 May, 2019; originally announced May 2019.

Comments: 9 pages, 6 pictures

Showing 1–6 of 6 results for author: Yao, V