Computer Science > Computation and Language
[Submitted on 23 Oct 2023 (v1), last revised 23 Dec 2023 (this version, v2)]
Title:DeTiME: Diffusion-Enhanced Topic Modeling using Encoder-decoder based LLM
View PDF HTML (experimental)Abstract:In the burgeoning field of natural language processing (NLP), Neural Topic Models (NTMs) , Large Language Models (LLMs) and Diffusion model have emerged as areas of significant research interest. Despite this, NTMs primarily utilize contextual embeddings from LLMs, which are not optimal for clustering or capable for topic based text generation. NTMs have never been combined with diffusion model for text generation. Our study addresses these gaps by introducing a novel framework named Diffusion-Enhanced Topic Modeling using Encoder-Decoder-based LLMs (DeTiME). DeTiME leverages Encoder-Decoder-based LLMs to produce highly clusterable embeddings that could generate topics that exhibit both superior clusterability and enhanced semantic coherence compared to existing methods. Additionally, by exploiting the power of diffusion model, our framework also provides the capability to do topic based text generation. This dual functionality allows users to efficiently produce highly clustered topics and topic based text generation simultaneously. DeTiME's potential extends to generating clustered embeddings as well. Notably, our proposed framework(both encoder-decoder based LLM and diffusion model) proves to be efficient to train and exhibits high adaptability to other LLMs and diffusion model, demonstrating its potential for a wide array of applications.
Submission history
From: Weijie Xu [view email][v1] Mon, 23 Oct 2023 19:03:04 UTC (1,086 KB)
[v2] Sat, 23 Dec 2023 07:05:20 UTC (1,086 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.