Data Lake Basics
Published by Pearson
Learn the Essentials of Effective Data Management with Data Lakes
- Best practices for data ingestion, storage, and processing
- Tips to maximize your efficiency when using data lakes
- Live demonstrations using Python and Jupyter Notebook
This course is designed for data science newcomers and data managers who want an introduction to building and managing data lakes. Participants will learn the fundamentals of data lake architecture, best practices for data ingestion and processing, and practical skills. By the end of the course, attendees will have the foundational knowledge about data lakes.
What you’ll learn and how you can apply it
- Build and configure data lakes
- Implement best practices for data ingestion and storage
- Perform some data transformations and queries in a data lake environment
- Apply learned techniques to real-world data management scenarios
This live event is for you because...
- You are looking for a quick introduction to data lakes
- You are new to data lakes and want to learn the basics
- You are a data scientist newcomer looking to enhance your data management skills
- You are an executive or manager seeking to understand the value and implementation of data lakes
- You want to understand the best practices and tools for managing large datasets
Prerequisites
- This course is suitable for beginners.The demos will be conducted in Python and Jupyter Notebook.Some basic programming experience is useful but not required.
Course Setup:
- No prerequisites required.
- The course demos will be conducted in Python and Jupyter Notebook. For those who wish to follow along, installing the Anaconda Distribution (which includes Python and Jupyter Notebook) is recommended.
Recommended Preparation:
- Watch: Fundamentals of Data Analytics in Python (by Aron Ahmadia and Peter Wang)
- Watch: Data Science Fundamentals Part 1: Learning Basic Concepts, Data Wrangling, and Databases with Python (by Jonathan Dinu)
Recommended Follow-up:
- Watch: ODSC East 24 Open Data Science Conference (by ODSC)
- Watch: ODSC Generative AI Summit 2023 Open Data Science Conference (by ODSC Open Data Science Conference)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
What is a Data Lake and its importance? (10 mins)
Overview of Data Lake Architecture (10 mins)
Environment Setup (10 mins)
- Introduction to Python and Jupyter Notebook
- Installation and configuration using Anaconda Distribution
- Q&A (5 mins)
- Break (5 mins)
Data Lake Architecture (45 mins)
- Core components and architecture of a data lake
- Data ingestion techniques
- Storage solutions and formats
- Demo: Creating and Configuring a Data Lake
- Q&A (5 mins)
- Break (5 mins)
Data Processing (45 mins)
- Transforming data
- Querying data
- Real-world examples and use cases
- Demo: Data Transformation and Querying with Python
- Q&A (5 mins)
- Break (5 mins)
Best Practices and Common Challenges (15 mins)
- Best practices for managing data lakes
- Common challenges
- Q&A (10 mins)
Course wrap-up and next steps (5 mins)
Your Instructor
Edson Pinheiro
Edson Pinheiro is an experienced data scientist, consultant, and educator with a passion for teaching advanced data topics. With extensive real-world experience in data projects, Edson brings practical insights and hands-on expertise to his training sessions. Edson Pinheiro has been building numerical models and doing data analysis since his work at the National Petroleum Agency (ANP) in 2007 and as Data Scientist at Petrobras. Currently, he is Managing Partner of a Data Strategy products and services company. As a renowned expert in data and artificial intelligence, Edson has dedicated his career to solving some of humanity's biggest challenges with the help of Data Science and Artificial Intelligence. It has contributed to the training of a new generation of professionals who will use these tools to shape the future of work and society.