New framework revolutionizes text clustering and topic extraction for better results.
The ClusTop framework combines text clustering and topic extraction to improve performance. By integrating these tasks, it can create high-quality clusters and extract topics simultaneously. The framework includes enhanced language model training, dimensionality reduction, clustering, and topic extraction. The enhanced language model helps with effective clustering by providing text embeddings with a strong cluster structure. It also focuses on topic-related words for topic extraction. The training of the enhanced language model is unsupervised. Experiments on two datasets show that the ClusTop framework is effective and provides benchmarks for different model combinations.