ML Club Video: The Transformer
No, not that kind of transformer (sorry Optimus)!
The transformer is an neural network architecture proposed at Google for dethroning the LSTM. Rather than having a “sliding window” like LSTMs and RNNs, the transformer looks considers every word, every value, all together for prediction. This concept of attention, where every word is factored in when determining internal representations and predictions is shown to be so powerful, that researchers found that attention itself can provide great performance. In other words, Attention is All You Need. (In fact that is exactly what the paper is titled!)
The transformer architecture is what is behind the language models of today – from Google Translate to ChatGPT. Want to learn how these work? Watch the video!