ML Club Video: The Transformer

Posted Feb 20, 2024 Updated Jul 11, 2025

By Karthik S. Vedula

1 min read

No, not that kind of transformer (sorry Optimus)!

The transformer is an neural network architecture proposed at Google for dethroning the LSTM. Rather than having a “sliding window” like LSTMs and RNNs, the transformer looks considers every word, every value, all together for prediction. This concept of attention, where every word is factored in when determining internal representations and predictions is shown to be so powerful, that researchers found that attention itself can provide great performance. In other words, Attention is All You Need. (In fact that is exactly what the paper is titled!)

The transformer architecture is what is behind the language models of today – from Google Translate to ChatGPT. Want to learn how these work? Watch the video!

ML Club

This post is licensed under CC BY 4.0 by the author.

Trending Tags