New self-supervised language model ALBERT outperforms BERT with fewer parameters.
The article introduces ALBERT, a more efficient version of BERT for learning language representations. By reducing parameters and focusing on inter-sentence coherence, ALBERT improves memory consumption and training speed. This leads to better performance on various language tasks like GLUE and \squad benchmarks, surpassing BERT-large with fewer parameters. The code and pretrained models are available for public use.