Video description
Sponsored by Mobily
While large language models are groundbreaking tools for automating everyday text-based tasks such as text summarization, translation, and generation, we've also seen the emergence of more complex generative AI models that can process and output different types of data, such as images, audio, and even video. Multimodal AI models, such as GPT-4, are capable of working across different data formats, for example, to generate speech from text, text from images, or text from audio. By combining different modalities, multimodal AI can interact with humans in more natural, intuitive ways, mimicking how humans perceive and understand the world around them. The possibilities from processing inputs more holistically and providing more intuitive outputs are already nudging us closer to true artificial general intelligence.
What you’ll learn and how you can apply it
- Design more natural, human-like interactions between AI systems and users by leveraging multimodal capabilities
- Explore fundamental mathematical concepts like multimodal alignment and fusion, heterogeneous representation learning, and multistream temporal modeling
- Review practical applications such as advanced voice assistants, smart home systems, and virtual shopping experiences
This live course is for you because...
- You're a current or future AI product owner or AI/machine learning practitioner.
- You want to learn about the state of the art in artificial intelligence and how large language models can be leveraged to build new applications and solve your organizational challenges.
Recommended follow-up:
- Read Hands-On Large Language Models (book)
- Watch How Can We Build a Multimodal LLM like GPT-4o? (Shortcut video)
- Take Generative AI for Developers: Creating Apps with the ChatGPT API (on-demand course)
Table of contents
- Antje Barth–Keynote: Recent Breakthroughs in Multimodal Generative AI
- Nahid Alam: Unveiling the Edge of Generative AI—Resource, Cost, and Performance Trade-Offs for Multimodal Foundational Models
- Suhas Pai: Evaluation of Multimodal Systems
- Omar Aldughayem: Enhancing Telecom Customer Service with Multimodal AI-Powered Chatbots (Sponsored by Mobily)
- Rikin Gandhi: How We Built Farmer.Chat, a Multimodal GenAI Assistant
- Anthony Susevski and Andrei Betlen: Quickly POCing Multimodal LLMs, Even on a ThinkPad
- Shekhar Iyer: Risky Business—How to Protect GenAI Applications from Security and Safety Risks
- Chris Fregly: Beyond LLMs—Mastering Multimodal RAG for Engaging Generative AI Applications
- Jingying Gao: Teaching AI to Solve Complex Logical Reasoning Using Multimodal Models
Product information
- Title: AI Superstream: Multimodal Generative AI
- Author(s):
- Release date: September 2024
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 0642572057312
You might also like
video
AI Superstream Series: Scaling AI
Sponsored by intel and LSEG LABS Scaling AI is a notoriously difficult challenge. But it’s easier …
video
AI Superstream: Data-Centric AI
Over the past decade, the field of AI has achieved incredible results by focusing on building …
video
The Complete Obsolete Guide to Generative AI, Video Edition
The last book on AI you’ll ever need. We swear! AI technology moves so fast that …
video
AI Superstream: Building with Open Source Generative AI Models and Frameworks
The landscape of open source technologies for building AI applications has expanded rapidly since the advent …