Book description
Over 85 recipes to help you complete real-world data science projects in R and Python
About This Book
- Tackle every step in the data science pipeline and use it to acquire, clean, analyze, and visualize your data
- Get beyond the theory and implement real-world projects in data science using R and Python
- Easy-to-follow recipes will help you understand and implement the numerical computing concepts
Who This Book Is For
If you are an aspiring data scientist who wants to learn data science and numerical programming concepts through hands-on, real-world project examples, this is the book for you. Whether you are brand new to data science or you are a seasoned expert, you will benefit from learning about the structure of real-world data science projects and the programming examples in R and Python.
What You Will Learn
- Learn and understand the installation procedure and environment required for R and Python on various platforms
- Prepare data for analysis by implement various data science concepts such as acquisition, cleaning and munging through R and Python
- Build a predictive model and an exploratory model
- Analyze the results of your model and create reports on the acquired data
- Build various tree-based methods and Build random forest
In Detail
As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don’t. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use.
Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python.
Style and approach
This step-by-step guide to data science is full of hands-on examples of real-world data science tasks. Each recipe focuses on a particular task involved in the data science pipeline, ranging from readying the dataset to analytics and visualization
Table of contents
- Preface
-
Preparing Your Data Science Environment
- Understanding the data science pipeline
- Installing R on Windows, Mac OS X, and Linux
- Installing libraries in R and RStudio
- Installing Python on Linux and Mac OS X
- Installing Python on Windows
- Installing the Python data stack on Mac OS X and Linux
- Installing extra Python packages
- Installing and using virtualenv
- Driving Visual Analysis with Automobile Data with R
- Creating Application-Oriented Analyses Using Tax Data and Python
- Modeling Stock Market Data
-
Visually Exploring Employment Data
- Introduction
- Preparing for analysis
- Importing employment data into R
- Exploring the employment data
- Obtaining and merging additional data
- Adding geographical information
- Extracting state- and county-level wage and employment information
- Visualizing geographical distributions of pay
- Exploring where the jobs are, by industry
- Animating maps for a geospatial time series
- Benchmarking performance for some common tasks
- Driving Visual Analyses with Automobile Data
-
Working with Social Graphs
- Introduction
- Preparing to work with social networks in Python
- Importing networks
- Exploring subgraphs within a heroic network
- Finding strong ties
- Finding key players
- Exploring the characteristics of entire networks
- Clustering and community detection in social networks
- Visualizing graphs
- Social networks in R
-
Recommending Movies at Scale (Python)
- Introduction
- Modeling preference expressions
- Understanding the data
- Ingesting the movie review data
- Finding the highest-scoring movies
- Improving the movie-rating system
- Measuring the distance between users in the preference space
- Computing the correlation between users
- Finding the best critic for a user
- Predicting movie ratings for users
- Collaboratively filtering item by item
- Building a non-negative matrix factorization model
- Loading the entire dataset into the memory
- Dumping the SVD-based model to the disk
- Training the SVD-based model
- Testing the SVD-based model
-
Harvesting and Geolocating Twitter Data (Python)
- Introduction
- Creating a Twitter application
- Understanding the Twitter API v1.1
- Determining your Twitter followers and friends
- Pulling Twitter user profiles
- Making requests without running afoul of Twitter's rate limits
- Storing JSON data to disk
- Setting up MongoDB for storing Twitter data
- Storing user profiles in MongoDB using PyMongo
- Exploring the geographic information available in profiles
- Plotting geospatial data in Python
- Forecasting New Zealand Overseas Visitors
- German Credit Data Analysis
Product information
- Title: Practical Data Science Cookbook - Second Edition
- Author(s):
- Release date: June 2017
- Publisher(s): Packt Publishing
- ISBN: 9781787129627
You might also like
book
Practical Data Science with Python 3: Synthesizing Actionable Insights from Data
Gain insight into essential data science skills in a holistic manner using data engineering and associated …
book
Cleaning Data for Effective Data Science
Think about your data intelligently and ask the right questions Key Features Master data cleaning techniques …
book
Python Data Science Essentials - Third Edition
Gain useful insights from your data using popular data science tools Key Features A one-stop guide …
book
Practical Data Science with Python
Learn to effectively manage data and execute data science projects from start to finish using Python …