Vision Transformer (ViT): An Image is Worth 16x16 Words

GPTKB entity