New text clustering algorithm outperforms competitors in Chinese and English datasets.
The researchers developed a new text clustering algorithm that improves upon traditional methods by selecting initial cluster centroids based on similarity information between different sets of data. This reduces the algorithm's sensitivity to initial cluster choices. The algorithm also calculates threshold values dynamically during clustering, rather than relying on preset values. Experiments on Chinese and English datasets showed that this new algorithm outperformed existing clustering methods in terms of accuracy and efficiency.