
ML Club Video: Vision Transformers

Transformers have proven to be kings of Natural Language Processing, but can they be kings in Computer Vision too? A set of engineers at Google set out to answer this question, and they came up with Vision Transformers! But how does a transformer (which normally takes in words) take in an image as input? Do Vision Transformers provide better performance compared to Convolution Neural Networks? Watch the video to find out!

This post is licensed under CC BY 4.0 by the author.