AI Superstream: Multimodal Generative AI

by Susan Shu Chang, Rikin Gandhi, Suhas Pai, Nahid Alam, Anthony Susevski, Andrei Betlen, Shekhar Iyer, Jingying Gao, Antje Barth, Omar Aldughayem, Chris Fregly

Released September 2024

Publisher(s): O'Reilly Media, Inc.

ISBN: 0642572057312

Start your free trial

Video description

Sponsored by Mobily

While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we've also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, and even video. Multimodal AI models, such as GPT-4, are capable of working across different data formats, for example, to generate speech from text, text from images, or text from audio. By combining different modalities, multimodal AI can interact with humans in more natural, intuitive ways, mimicking how humans perceive and understand the world around them. The possibilities from processing inputs more holistically and providing more intuitive outputs are already nudging us closer to true artificial general intelligence.

What you’ll learn and how you can apply it

Design more natural, human-like interactions between AI systems and users by leveraging multimodal capabilities
Explore fundamental mathematical concepts like multimodal alignment and fusion, heterogeneous representation learning, and multistream temporal modeling
Review practical applications such as advanced voice assistants, smart home systems, and virtual shopping experiences

This live course is for you because...

You're a current or future AI product owner or AI/machine learning practitioner.
You want to learn about the state of the art in artificial intelligence and how large language models can be leveraged to build new applications and solve your organizational challenges.

Recommended follow-up:

Read Hands-On Large Language Models (book)
Watch How Can We Build a Multimodal LLM like GPT-4o? (Shortcut video)
Take Generative AI for Developers: Creating Apps with the ChatGPT API (on-demand course)

Product information

Title: AI Superstream: Multimodal Generative AI
Author(s): Susan Shu Chang, Rikin Gandhi, Suhas Pai, Nahid Alam, Anthony Susevski, Andrei Betlen, Shekhar Iyer, Jingying Gao, Antje Barth, Omar Aldughayem, Chris Fregly
Release date: September 2024
Publisher(s): O'Reilly Media, Inc.
ISBN: 0642572057312

video

AI Superstream Series: Scaling AI

by Antje Barth, Venkatesh Ramanathan, Adi Polak, Victor Dibia, Spence Green, Rebecca Bilbro, Robert Nishihara, Geoff Horrell

Sponsored by intel and LSEG LABS Scaling AI is a notoriously difficult challenge. But it’s easier …

video

AI Superstream: Data-Centric AI

by Fabiana Clemente, Andrew Ng, Vijay Janapa Reddi, Emeli Dral, Atindriyo Sanyal, Bernease Herman, Kevin McNamara, Curtis Northcutt, Eric Landau

Over the past decade, the field of AI has achieved incredible results by focusing on building …

video

The Complete Obsolete Guide to Generative AI, Video Edition

by David Clinton

The last book on AI you’ll ever need. We swear! AI technology moves so fast that …

video

AI Superstream: Building with Open Source Generative AI Models and Frameworks

by Susan Shu Chang, James Spiteri, Denys Linkov, Avin Regmi, Mandy Gu, Nicole Königstein, Leandro von Werra

The landscape of open source technologies for building AI applications has expanded rapidly since the advent …

AI Superstream: Multimodal Generative AI

Video description

What you’ll learn and how you can apply it

This live course is for you because...

Table of contents

Product information

You might also like

AI Superstream Series: Scaling AI

AI Superstream: Data-Centric AI

The Complete Obsolete Guide to Generative AI, Video Edition

AI Superstream: Building with Open Source Generative AI Models and Frameworks

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly