AI Superstream: Deploying and Managing LLMs in Production
Published by O'Reilly Media, Inc.
Best practices and practical tips for real-world applications
Join industry experts and top thought leaders from the field to delve into the innovative applications of large language models in various production environments. Whether you're interested in enhancing natural language understanding in chatbots, automating content generation, or optimizing data analysis, you'll get valuable insights, best practices, and practical tips for deploying and managing LLMs effectively in production, drawn from real-world examples and case studies.
About the AI Superstream Series: This three-part series of half-day online events is packed with insights from some of the brightest minds in AI. You’ll get a deeper understanding of the latest tools and technologies that can help keep your organization competitive and learn to leverage AI to drive real business results.
What you’ll learn and how you can apply it
- Frontline experiences and challenges of managing LLMs in production environments
- Tools, techniques, and tips to make working with LLMs in production easier
This live event is for you because...
- You’re an AI practitioner who’s currently leveraging large language models in production.
- You’re a business or organizational leader who’s exploring the potential uses of LLMs for your organization.
Prerequisites
- Come with your questions
- Have a pen and paper handy to capture notes, insights, and inspiration
Recommended follow-up:
- Follow AI Superstream: Deploying and Managing LLMs in Production (expert playlist)
- Read AI Engineering (early release book)
- Read Hands-On Large Language Models (early release book)
- Read Designing Large Language Model Applications (early release book)
- Take Large Language Models in Production (live course with Skanda Vivek)
- Take Getting Started with LangChain (live course with Lucas Soares)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Fabiana Clemente: Introduction (5 minutes) - 8:00am PT | 11:00am ET | 3:00pm UTC/GMT
- Fabiana Clemente welcomes you to the AI Superstream.
Chip Huyen–Keynote: The New AI Stack with Foundation Models (15 minutes) - 8:05am PT | 11:05am ET | 3:05pm UTC/GMT
- How has the ML engineering stack changed with generative AI? While the landscape is still rapidly evolving, some patterns have emerged. Chip Huyen shares these patterns, identified in her survey of more than 900 open source AI repos as well as discussions with many ML platform teams both big and small. Spoiler: the principles of deploying ML models into production remain the same, but there are many new challenges and approaches.
- Chip Huyen works to accelerate data analytics on GPUs at Voltron Data. She’s also the author of the book Designing Machine Learning Systems (O’Reilly). Previously, she was with Snorkel AI and NVIDIA, exited an AI infrastructure startup, and taught the course Machine Learning Systems Design at Stanford.
Yujian Tang: Getting Your RAG App into Production (30 minutes) - 8:20am PT | 11:20am ET | 3:20pm UTC/GMT
- Yujian Tang takes you through the architecture of a RAG app, including the LLM, vector database, prompts, and embedding model, and shows you how to get data into a vector database and a basic RAG app into production.
- Yujian Tang is a developer advocate at Zilliz. He has a background as a software engineer working on AutoML at Amazon. Yujian studied computer science, statistics, and neuroscience and published research papers at conferences including IEEE BigData. He enjoys drinking bubble tea, spending time with family, and being near water.
- Break (5 minutes)
Matteo Dora: Secure LLM App Deployments—Strategies and Tactics (30 minutes) - 8:55am PT | 11:55am ET | 3:55pm UTC/GMT
- Deploying large language model applications in production can be a daunting endeavor due to the numerous potential failure modes inherent in these systems. From prompt injections and the generation of inappropriate content to discriminatory behavior and performance degradation, various factors can complicate an LLM application's deployment—or outright block it. Matteo Dora provides an overview of the risks associated with LLM applications and introduces methodologies and techniques designed to proactively detect and mitigate these threats. He also examines key concepts such as red teaming, vulnerability scanning, and automatic benchmarking and takes a hands-on approach to presenting them.
- Matteo Dora is a machine learning researcher at Giskard, where he leads the LLM safety team. His work focuses on the practical applications and implications of AI, emphasizing the intersection of ethics, safety, and security. With his current team, he has developed a unique expertise in conducting red team assessments of generative AI applications. Previously, he worked as an academic researcher in the field of neuroscience and applied mathematics.
Thomas Stadelmann: From Zero to Production—The Story of an LLM App (Sponsored by deepset) (30 minutes) - 9:25am PT | 12:25pm ET | 4:25pm UTC/GMT
- Introducing LLM apps into production is both exciting and demanding. Through a practical example, Thomas Stadelmann illustrates how to navigate the myriad possibilities encountered when constructing LLM applications. He discusses selecting the best-fitting components and employing the correct tools for monitoring and ongoing assessment of your solution, and he goes beyond the glamorous outcomes to delve into the inner workings, exploring the behind-the-scenes essentials, such as large-scale feedback collection, needed to smoothly operate an LLM application.
- Thomas Stadelmann is an AI engineer at deepset. He loves to bring innovation to deepset’s commercial LLM platform while also contributing to its open source LLM framework, Haystack. His passion for today's LLM-based solutions in information retrieval was engendered during a decade of working on professional search applications.
- This session will be followed by a 30-minute Q&A in a breakout room. Stop by if you have more questions for Thomas.
- Break (5 minutes)
Diego Oppenheimer: Product Problem Considerations When Building LLM-Based Applications (30 minutes) - 10:00am PT | 1:00pm ET | 5:00pm UTC/GMT
- Large language models are poised to revolutionize product development, but harnessing their power while maintaining reliability requires careful consideration. Diego Oppenheimer explores the key product challenges of building LLM-based applications: stability, accuracy, limited understanding of the performance of systems in production, and lack of developer control.
- Diego Oppenheimer is a partner at Factory, a venture fund specializing in AI investments, and a cofounder at Guardrails AI. A serial entrepreneur, product developer, and investor with an extensive background in all things data, he was an executive vice president at DataRobot and founder and CEO at Algorithmia (acquired by DataRobot), and he shipped some of Microsoft’s most used data analysis products including Excel, PowerBI, and SQL Server. Diego is a founding member and strategic advisor for the AI Infrastructure Alliance and MLOps Community and works with leaders to define AI industry standards and best practices.
David Talby: When LLMs Go Rogue—Lessons Learned Taking Generative AI to Production (30 minutes) - 10:30am PT | 1:30pm ET | 5:30pm UTC/GMT
- The past year has been filled with frameworks, tools, libraries, and services that aim to simplify and accelerate the development of generative AI applications. In practice, however, most of them don’t work on real use cases and datasets. David Talby surveys lessons learned from real-world projects with compelling POCs that only later revealed major gaps in what a production-grade system requires. You'll learn the relevance limits of turnkey vector databases and RAG LLM architectures and the role of preprocessing, splitting, enrichment, ranking, and reranking techniques. You’ll see where guardrails and prompt engineering fall short in addressing critical bias, sycophancy, and stereotype risks and understand the fragility and sensitivity of current LLMs to minor changes to both datasets and prompts and their impacts on accuracy.
- David Talby is the chief technology officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP—the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams, including in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a PhD in computer science and master’s degrees in computer science and business administration. He was named USA CTO of the Year by the Global 100 Awards and Game Changers Awards in 2022.
- This session will be followed by a 30-minute Q&A in a breakout room. Stop by if you have more questions for David.
- Break (5 minutes)
Fábio Nonato: The Devil Is in the Scale—Engineering Generative AI for Production (30 minutes) - 11:05am PT | 2:05pm ET | 6:05pm UTC/GMT
- Join Fábio Nonato for a candid exploration into the art of deploying generative AI in production, with a focus on what really matters: data, system design, and the human touch. Challenging the notion that finding the perfect model is the hard part, he shares insights into how a well-considered data strategy and thoughtful system design can make all the difference. In this honest take on why the people behind the tech are just as important as the algorithms they deploy, you can expect practical tips, laughs, and a good look through a more human lens at how to build generative AI in production.
- Fábio Nonato leads the GenAI specialist solution architecture and applied science team at AWS, helping third-party model providers scale their foundation model offers. Previously, he was the head of AI engineering for Intel's Data Center and AI business, launching three new software products that leveraged generative AI to empower enterprise users with advanced knowledge discovery tools. Other previous roles at global firms like Shell and GE encompassed building and advising on large-scale enterprise AI/ML applications, trading renewable energy assets using AI, building the first reference solution for training GPT2 using HPC at cloud scale, and using AI to generate immersive 3D environments from text.
Abi Aryan: A Systematic Framework for LLM Evaluation (30 minutes) - 11:35am PT | 2:35pm ET | 6:35pm UTC/GMT
- Evaluating the model performance is the key for ensuring effectiveness and reliability of LLM models. Abi Aryan takes you on a deep dive into the intricate world of RAG evaluation metrics and frameworks, exploring the nuanced approaches to assessing model performance. She discusses key metrics such as relevance, diversity, coherence, and truthfulness and examines evaluation frameworks from traditional benchmarks to domain-specific assessments, highlighting their strengths, limitations, and potential implications for real-world applications.
- Abi Aryan is the founder of Abide AI and a machine learning engineer with over eight years of experience building and deploying machine learning models in production for recommender systems, computer vision, and natural language processing within the ecommerce, insurance, and media and entertainment industries. She’s also working on the book LLMOps: Managing Large Language Models in Production for O'Reilly. Previously, she was a visiting research scholar at the Cognitive Sciences Lab at UCLA where she worked on developing intelligent agents. Abi has also authored research papers in AutoML, multiagent systems, and LLM cost modeling and evaluations.
Fabiana Clemente: Closing Remarks (5 minutes) - 12:05pm PT | 3:05pm ET | 7:05pm UTC/GMT
- Fabiana Clemente closes out today’s event.
Upcoming AI Superstream events:
- Building with Open Source Generative AI Models and Frameworks - June 26, 2024
Your Host
Fabiana Clemente
Fabiana Clemente is cofounder and CDO of YData, combining data understanding, causality, and privacy as her main fields of work and research, with the mission of making data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. She hosts the podcast When Machine Learning Meets Privacy and is a guest speaker on the Datacast and Privacy Please podcasts. She also speaks at conferences such as ODSC and PyData and was recently awarded “Founder of the Year” by the South Europe Startup Awards.