Chapter 2
Exploring Big Data
IN THIS CHAPTER
Using NumPy for data science
Using pandas for fast data analysis
Learning from your first data science project
Visualizing with Matplotlib in Python
In this chapter, you discover some of the tools and processes that data scientists use to format, process, and query data.
A number of Python-based tools and libraries as well as languages such as R are available for working with big data are available, but we decided to use NumPy for three reasons. First, it is one of the two most popular tools to use for data science in Python (the second most popular is Keras). Second, many AI-oriented projects use NumPy. And third, the highly useful Python data science package, pandas, is built on NumPy.
pandas is turning out to be an important package in data science. It encapsulates data in a more abstract way, making it easier to manipulate, document, and understand the transformations you make in the base datasets.
Matplotlib is a good Python-centric package for visualizing the results of big data analysis but requires a steep learning curve. However, ...
Get Python All-in-One For Dummies, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.