On June 12, 2017, Google Brain researchers published "Attention Is All You Need," introducing the transformer architecture that became the foundation of every major language model developed since. The paper replaced recurrent neural networks with a self-attention mechanism that allowed models to process entire sequences simultaneously, dramatically improving training speed and scalability. Within five years the transformer architecture underpinned GPT, BERT, Claude, Gemini, and nearly every large language model — making it arguably the most consequential AI paper since backpropagation.