What's New :
Open Session on IAS Mains 2025. Register Here
10th October 2023 (10 Topics)

Multimodal artificial intelligence

Context

After a report by “The Information” revealed that Google’s new yet-to-be-released multimodal large language model called ‘Gemini’ was already being tested in a bunch of companies.

Google is also working on a new project called‘Gobi’ which is expected to be a multimodal AI system from scratch, unlike the GPT models.

About

About multimodal AI:

  • Multimodal AI combines different types of information like text, images, and audio to perform various tasks, such as detecting hateful memes or predicting dialogue lines in videos.
  • Models like OpenAI's DALL.E use this approach to generate images based on text prompts, by finding patterns that connect visual data with image descriptions.
  • In the case of audio, OpenAI's Whisper, a speech-to-text translation model, enables the system to recognize speech in audio and convert it into simple text.

Applications of Multimodal AI:

  • Meta introduced a complex open-source AI system called ImageBind, which incorporates text, visual data, audio, temperature, and movement readings.
    • This system hints at the possibility of future AI including more sensory data like touch, smell, and brain signals.
  • Industries like medicine and autonomous driving benefit from multimodal AI.
    • It helps analyze complex datasets in areas like identifying rare genetic variations and processing CT scans.
    • Additionally, speech translation models like Google Translate use multiple modes for efficient translation across different languages.
X

Verifying, please be patient.

Enquire Now