Revolutionizing Language: Transformers vs. RNNs — Unleashing the Power of Attention

Amber Ivanna Trujillo
6 min readOct 31, 2023

In the ever-evolving landscape of artificial intelligence, there’s a seismic shift that has reshaped the way we understand and generate language. In the not-so-distant past, Recurrent Neural Networks (RNNs) held the torch, attempting to navigate the complex terrain of language with mixed success. But then came a game-changer: Transformers. Brace yourselves for a journey through the evolution of language models, where attention becomes the linchpin to unparalleled progress.

The RNN Era: A Glimpse into the Past:

Once upon a time, RNNs were the architects of text prediction. They were formidable in their own right, but as the demands of language complexity grew, their limitations became apparent. Simple tasks, such as Natural Language Generation (NLG), often pushed RNNs to their computational boundaries. The more text they had to process, the more they demanded in terms of computing power and memory, rendering them mediocre even after extensive scaling efforts.

A fully recurrent network. Created by fdeloche at Wikipedia, licensed as CC BY-SA 4.0. No changes were made.

Where x, h, o are the input sequence, hidden state, and the output sequence, respectively. U, V, and W are the training weights.

--

--

Amber Ivanna Trujillo

I am Executive Data Science Manager. Interested in Deep Learning, LLM, Startup, AI-Influencer, Technical stuff, Interviews and much more!!!