Revolutionizing Persian Language Models: GPT-2 Outperforms BERT in Perplexity
The article discusses how attention-based models like BERT and Persian GPT-2 were fine-tuned on Persian text to improve language modeling. These models were tested on a large dataset and both showed better results than previous models, with GPT-2 performing slightly better. A new measure called bi-perplexity was introduced to compare language models trained with a specific method, and a new sampling strategy was devised for using BERT as a language model.