Chapter 3. Data Intended for Human Consumption, Not Machine Consumption
This chapter describes issues that can arise when a dataset has been provided in a format that is designed mainly for consumption by human eyeballs.
Data is typically provided this way in order to allow a human to extract a particular message from the data.
The problem is that we inevitably end up wanting to do more with the data, which means working with the data using software, which means explaining the format of the data to the software, which in turn means that we end up wishing that the data were formatted for consumption by a computer, not human eyeballs.
The Data
The main high school qualification in New Zealand is called NCEA (National Certificate of Educational Achievement). A typical student will attempt to gain NCEA Level 1 in Year 11 (their eleventh year of formal education), NCEA Level 2 in Year 12, and Level 3 in Year 13. However, it is also possible for students to attempt NCEA levels in earlier years or to gain an NCEA level in a later year if they fail at the first attempt.
This leads to statistics on the number (or percentage) of students who have attained each level of NCEA by the end of each year of formal education (see Example 3-1).
Example 3-1. Number of students gaining NCEA in 2010 by level and year
Year 11 Year 12 Year 13 NCEA (Level 1) 41072 46629 40088 NCEA (Level 2) 1050 37513 38209 NCEA (Level 3) 91 451 24688
The Problem: Data Formatted for Human Consumption
Tables of NCEA ...
Get Bad Data Handbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.