LLMs for Data Science
Published by Pearson
Automate your work by leveraging OpenAI, Hugging Face, PandasAI, & LangChain
- Integrate LLMs into your data science workflow
- Understand the pros and cons of different LLMs while exploring their frameworks and APIs
- Leverage LLMs to automate data processing tasks and for data processing applications
Large Language Models (LLMs) are powerful tools that put state-of-the-art AI capabilities at the tip of our fingers. They can process large amounts of data, understand nuance and context, and perform complex tasks at our request. Over the past few years, LLMs have multiplied as have the tools specially built to leverage their capabilities.
In this course, you will learn how to use large language models to perform data science tasks such as summarization, translation, named entity recognition, audio generation, and data processing. We’ll explore the possibilities afforded by the tools and APIs developed by OpenAI, Hugging Face, LangChain, and Pandas AI and how best to apply them to our data science work.
What you’ll learn and how you can apply it
- Choose the right LLM to automate data processing tasks
- Extract information from large amounts of text
- Generate transcriptions from sound files and audio versions of written text
- Interact with datasets through a natural language interface
This live event is for you because...
- You’re a data scientist or software engineer who wants to develop generative AI pipelines
- You want to leverage the power of LLM and Generative AI in your Data Science work
- You’d like to automate complex tasks with state-of-the-art GenAI tools
Prerequisites
- Python Programming
- Basic Understanding of Generative AI and LLMs
- Fundamentals of NLP
Course Set-up
- Complete Python distribution (like Anaconda, etc)
- https://github.com/DataForScience/LLM4DS
Recommended Preparation
- Attend: Generative Artificial Intelligence with the OpenAI API for Developers by Bruno Gonçalves
- Attend: ChatGPT and Competing LLMs by Bruno Gonçalves
Recommended Follow-up
- Read: Natural Language Processing with Transformers, Revised Edition by Lewis Tunstall, Leandro von Werra, and Thomas Wolf
- Watch: GenAI Superstream: Possibilities and Pitfalls by Thomas Nield, Elan Head, Emmanuel Maggiori, Matt Perez, Gary N. Smith, Chris Fregly, Natalie Pistunovich, and Shelbee Eigenbrod
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Segment 1: Generative AI for Data Science (45 minutes)
- Generative AI
- Large Language Models
- OpenAI
- Hugging Face
- LangChain
Q&A (5 minutes)
Break (5 minutes)
Segment 2: Prompt Engineering for Data Science (25 minutes)
- Output formatting
- Prompt Techniques
- Zero-Shot and Few-Shot Prompting
- Chain of Thought
Q&A (5 minutes)
Segment 3: Natural Language Processing with Hugging Face (45 minutes)
- Named-Entity Recognition
- Part-of-Speech Tagging
- Summarization
- Question Answering
Q&A (5 minutes)
Break (10 minutes)
Segment 4: Text to Speech with Open AI (30 minutes)
- The Whisper model
- Generating audio from text
- Audio transcription
- Automatic Translation
Q&A (5 minutes)
Segment 5: Pandas AI (50 minutes)
- Pandas AI
- Natural language querying
- Data cleansing
- Data visualization
Q&A (5 minutes)
Course wrap-up and next steps (5 minutes)
Your Instructor
Bruno Gonçalves
Bruno Gonçalves is currently a Head of Data Science working at the intersection of AI, Blockchain Technologies, and Finance. Previously, he was a Data Science Fellow at NYU's Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since the completion of his PhD in the Physics of Complex Systems in 2008, he has pursued the use of Data Science and Machine Learning to the large-scale study of human behavior. In 2015, he was awarded the Complex Systems Society's Junior Scientific Award for "outstanding contributions in Complex Systems Science," and in 2018 he was named a Science Fellow of the Institute for Scientific Interchange in Turin, Italy.