Video description
Data scraping is the technique of extracting data from the Internet. The course Data Scraping and Data Mining from Beginner to Professional is crafted to cover topics that result in the development of the most in-demand skills in the workplace. These topics will help you understand the concepts and methodologies with regard to Python. The course is easy to understand, imaginative and descriptive, comprehensive, practical with live coding, full of quizzes with solutions, rich with state-of-the-art and updated knowledge of this field.
This course is designed for beginners. We’ll spend sufficient time on the fundamentals. Then, we will gradually go far deep with a lot of practical implementations where every step will be explained in detail.
As this course is essentially a compilation of all the basics, you will move ahead at a steady rate. You will experience more than what you have learned. Most of these activities are designed to get you up and running with implementations.
The four hands-on projects included are the most important part of this course. These projects allow you to experiment for yourself with trial and error. You will learn from your mistakes. Importantly, you will understand the potential gaps that may exist between theory and practice.
What You Will Learn
- Understand the difference between synchronous and asynchronous requests
- Apply BS4 for parsing the response data from the server
- Explore the different tools that are used for data scraping; namely, Requests, BS4, Scrapy, Selenium
- Understand BS4 parser functions for getting the data out of the HTML
- Learn to use Scrapy to write the spiders for crawling websites and extracting data
- Learn to use Selenium to understand the automation and control of web flows
Audience
This course is for people who are beginners and absolutely new to data scraping and for individuals who want to make smart solutions and learn data scraping with real data using Python.
It is also useful for data scientists, machine learning experts, drop shippers who are interested in learning data scraping along with its implementation in realistic projects.
About The Author
AI Sciences: AI Sciences are experts, PhDs, and artificial intelligence practitioners, including computer science, machine learning, and Statistics. Some work in big companies such as Amazon, Google, Facebook, Microsoft, KPMG, BCG, and IBM.
AI sciences produce a series of courses dedicated to beginners and newcomers on techniques and methods of machine learning, statistics, artificial intelligence, and data science. They aim to help those who wish to understand techniques more easily and start with less theory and less extended reading. Today, they publish more comprehensive courses on specific topics for wider audiences.
Their courses have successfully helped more than 100,000 students master AI and data science.
Table of contents
- Chapter 1 : Introduction
-
Chapter 2 : Requests
- Introduction to Python Requests
- Hand on with Requests
- Extracting Quotes Manually
- Quiz (Extracting Authors)
- Solution (Extracting Authors)
- Pagination
- Quiz ( Extracting Author and Quotes)
- Solution 01 (Extracting Author and Quotes)
- Solution 02 (Extracting Author and Quotes)
- Ajax Requests
- Ajax Requests for Cricket Information
- Ajax Requests Pagination
- Quiz (Extracting Top Stats from Cricket info)
- Solution 01 (Extracting Top Stats from Cricket Information)
- Solution 02 (Extracting Top Stats from Cricket Information)
-
Chapter 3 : Beautiful Soap 4 (BS4)
- Introduction to BS4
- Quiz (Difference Between Requests and BS4)
- Solution (Difference Between Requests and BS4)
- Hands-On with BS4
- Extracting Data from Tree
- Extracting Quotes from the Website
- Quiz (Extracting Author Names)
- Solution (Extracting Author Names)
- Attributes of Tags in BS4
- Multi-Valued Attributes of Tags in BS4
- Scraping Movie Names from IMDB
- Quiz (Getting the Ratings, Year, Name of the Movie)
- Solution 01 (Getting the Ratings, Year, Name of the Movie)
- Solution 02(Getting the Ratings, Year, Name of the Movie)
- Scraping Time, Genre, and Release Date from IMDB 01
- Scraping Time, Genre, and Release Date from IMDB 02
- Combining Two Requests Data for IMDB
- Movies Recommender System (Creating Movie URL)
- Movies Recommender System (Creating Director URL)
- Movies Recommender System using BS4 (Getting Top 4 Movies)
- Movies Recommender System using BS4 (Merge All Requests Together)
-
Chapter 4 : CSS Selectors
- Introduction to CSS Selectors
- CSS Selectors Hands-On (Tags)
- Quiz (Tags)
- Solution (Tags)
- CSS Selectors Hands-On (Descendants, ID, Class)
- Quiz (Descendants)
- Solution (Descendants)
- Quiz (ID)
- Solution (ID)
- Quiz (Class)
- Solution (Class)
- CSS Selectors Hands-On (Nested Tags, ID Tags, Class Tags)
- Quiz (Class with Tag)
- Solution (Class with Tag)
- CSS Selectors Hands-on(Coma Separator, Universal Selectors
- Quiz (Combining Two Selectors)
- Solution (Combining Two Selectors)
- CSS Selectors Hands-On (Sibling Notations and Direct Child)
- Quiz (Adjacent Sibling)
- Solution (Adjacent Sibling)
- Quiz (General Sibling)
- Solution (General Sibling)
- CSS Selectors Hands-On (Child Selectors)
- Quiz (First Child)
- Solution (First Child)
- Quiz (Only Child)
- Solution (Only Child)
- Quiz (Last Child)
- Solution (Last Child)
- CSS Selectors Hands-On (Negations, Attributes)
- Quiz (Negation)
- Solution (Negation)
- CSS Selectors Hands-On (Attributes, Attribute Values)
- Quiz (Attribute Values)
- Solution (Attribute Values)
- CSS Selectors Hands-On (Attributes Wild Cards Values)
- Quiz (Attributes Wild Card)
- Solution (Attributes Wild Card)
-
Chapter 5 : Scrapy
- Introduction to Scrapy
- Comparison of Scrapy and Requests
- Scrapy at a Glance Documentation
- Getting Started with Scrapy
- Running Documentation Spider 1
- Running Documentation Spider 2
- Writing Spider from the Scratch
- Understanding the Response (URL, Status)
- Understanding the Response (Headers)
- Understanding the Response (Values in Headers)
- Understanding the Response (Body)
- Understanding the Response (Request)
- Understanding the Response (Meta)
- Understanding the Response (Flags, Certificate, ip_address, Copy)
- Understanding the Response (replace, urljoin, follow, follow_all)
- Response CSS and Scrapy Shell
- Extracting Quotes
- Understanding Nested Selectors
- Extracting the Author and Quotes
- Checking for Next Page
- Checking for Next Page in Spider
- Checking for Next Page URL
- Scraping Quotes from Next Pages
- Exporting Extracted Data
- Quiz (Get the Tags)
- Solution (Get the Tags)
- Next Website
- CSS Selectors for Movie Names and URLs
- Combined CSS Selectors for Movie Names and URLs
- Send Request to the Film Information Page
- Merge Data from Two Callbacks
- Extracting Movie Duration and Genres
- Exporting the Extracted Data
- Quiz (Extracting the Year)
- Solution (Extracting the Year)
- Getting Director Name and URL
- Getting Top Four Movies of Directors
- Extracting Data
- Extracting Data Anomaly (CSS Selector)
- Extracting Data Anomaly (dont_filter Flag)
-
Chapter 6 : Scrapy Project
- Hugoboss Website for Scraping
- Understanding Site Structure
- Writing CSS Selectors for Listings
- Listings in Scrapy Shell
- Sending Request to Listings URLs
- Writing CSS for Getting the Product from the listings
- Extracting Products URL from the Listings
- Sending Requests to Products of the Listings
- Writing CSS for Getting the Product Information
- Getting the Bigger Images of the Product
- Adding Pagination to Spider and Running It
- Output of the Spider
-
Chapter 7 : Selenium
- Introduction to Selenium
- Getting Started with Selenium
- Configuring the Webdriver
- Extracting Quotes
- Extracting Quotes and Author Names
- Quiz (Extracting Quotes)
- Solution (Extracting Quotes)
- Clicking on Button
- Pagination and Extracting Data
- Exception Handling for Unavailable Elements
- Navigating the Website for Login
- Quiz (Log In and Extract Quote)
- Solution (Log In and Extract Quote)
- Chapter 8 : Project Selenium
Product information
- Title: Data Scraping and Data Mining from Beginner to Pro with Python
- Author(s):
- Release date: September 2021
- Publisher(s): Packt Publishing
- ISBN: 9781801818483
You might also like
video
Data Science Fundamentals Part 1: Learning Basic Concepts, Data Wrangling, and Databases with Python
20 Hours of Video Instruction Data Science Fundamentals LiveLessons teaches you the foundational concepts, theory, and …
video
Data Understanding and Data Visualization with Python
Data visualization has gained a lot of traction resulting from an increased focus on data analytics. …
book
Practical Data Science with Python
Learn to effectively manage data and execute data science projects from start to finish using Python …
book
Practical Python Data Wrangling and Data Quality
The world around us is full of data that holds unique insights and valuable stories, and …