Revolutionary attention-based neural network outperforms conventional models in translation tasks by being simpler and faster.
A new network architecture based on attention mechanisms was developed for machine translation tasks. This architecture outperformed existing models in terms of quality, parallelizability, and training time. With 165 million parameters, the model achieved a BLEU score of 27.5 for English-to-German translation, surpassing the best ensemble result by over 1 BLEU. For English-to-French translation, the model outperformed the previous state-of-the-art by 0.7 BLEU, achieving a BLEU score of 41.1.