Researchers at Meta recently presented ‘An Introduction to Vision-Language Modeling’, to help people better understand the mechanics behind mapping vision to language. The paper includes everything from how VLMs work, how to train them, and approache