Chapter 2. End-to-End Machine Learning Project
In this chapter, you will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.1 Here are the main steps you will go through:
-
Look at the big picture.
-
Get the data.
-
Discover and visualize the data to gain insights.
-
Prepare the data for Machine Learning algorithms.
-
Select a model and train it.
-
Fine-tune your model.
-
Present your solution.
-
Launch, monitor, and maintain your system.
Working with Real Data
When you are learning about Machine Learning it is best to actually experiment with real-world data, not just artificial datasets. Fortunately, there are thousands of open datasets to choose from, ranging across all sorts of domains. Here are a few places you can look to get data:
-
Popular open data repositories:
-
Meta portals (they list open data repositories):
-
Other pages listing many popular open data repositories:
In this chapter we chose the California Housing Prices dataset from the StatLib repository2 (see Figure 2-1). This dataset was based on data from the 1990 California census. It is not exactly recent (you could still afford a nice house in the Bay Area at the time), but it has many qualities ...
Get Hands-On Machine Learning with Scikit-Learn and TensorFlow now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.