New Transformer Architecture Boosts Language Model Efficiency by 12 Units
The researchers improved Transformer models for language tasks by adding LSTM layers. Their Coordinate Architecture Search method found an effective model. Experimental results showed a significant improvement in language modeling performance compared to state-of-the-art LSTMs.