Transformer, the ML model that powers ChatGPT

Context

The Transformer, the machine learning model that powers ChatGPT, has gained significant attention and is making headlines.

About

What is Machine Learning (ML)?

Machine learning (ML), a subset of artificial intelligence, trains computers to perform tasks using structured data, language, audio, or images by presenting examples of inputs and their corresponding desired outputs.
- Unlike traditional computer programming that relies on explicit instructions, ML models learn to generate desired outputs by adjusting numerous parameters, often in the millions.
This enables the model to generalize its knowledge and make predictions or generate responses based on new inputs.
ML's ability to learn from data and adapt its behavior makes it a powerful tool for solving complex problems and handling diverse types of information.

What is ‘attention’?

Attention is a fundamental concept in machine learning that enables a model to determine the importance of different inputs.
- For example, in translation tasks, attention allows the model to select and weigh words from its memory bank, aiding in the decision of the next word to generate. Similarly, when describing an image, attention helps the model focus on relevant parts of the image while generating subsequent words.

A similar observation applies to image captioning.
For an image of a “bird flying above water”, the model is never told which region of the image corresponds to “bird” and which “water”.
Instead, by training on several image-caption pairs with the word “bird”, it discovers common patterns in the image to associate the flying thing with “bird”.

One captivating aspect of attention-based models is their ability to discover meaningful patterns and relationships through extensive data analysis. By parsing large volumes of data, these models uncover valuable insights and learn intricate dependencies.
Transformers are attention models on steroids. They employ multiple attention layers within both the encoder and decoder components.
- This architecture enables transformers to establish significant contextual understanding across input sentences or images in the encoder, and facilitate effective communication from the decoder to the encoder during tasks such as generating translated sentences or describing images.
- Transformers take attention to new heights, allowing for enhanced performance and comprehensive learning in a wide range of machine learning applications.