Chapter 2. End-to-End Machine Learning Project
In this chapter you will work through an example project end to end, pretending to be a recently hired data scientist at a real estate company.1 Here are the main steps you will go through:
-
Look at the big picture.
-
Get the data.
-
Discover and visualize the data to gain insights.
-
Prepare the data for Machine Learning algorithms.
-
Select a model and train it.
-
Fine-tune your model.
-
Present your solution.
-
Launch, monitor, and maintain your system.
Working with Real Data
When you are learning about Machine Learning, it is best to experiment with real-world data, not artificial datasets. Fortunately, there are thousands of open datasets to choose from, ranging across all sorts of domains. Here are a few places you can look to get data:
-
Popular open data repositories
-
Meta portals (they list open data repositories)
-
Other pages listing many popular open data repositories
In this chapter we’ll use the California Housing Prices dataset from the StatLib repository2 (see Figure 2-1). This dataset is based on data from the 1990 California census. It is not exactly recent (a nice house in the Bay Area was still affordable at the time), but it has many qualities for learning, so we will pretend it is recent data. ...
Get Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.