Chapter 2. Data Preprocessing

Before data can be analyzed, it is usually processed into some standardized form. This chapter describes those processes.

Data types

Data is categorized into types. A data type identifies not only the form of the data but also what kind of operations can be performed upon it. For example, arithmetic operations can be performed on numerical data, but not on text data.

A data type can also determine how much computer storage space an item requires. For example, a decimal value like 3.14 would normally be stored in a 32-bit (four bytes) slot, while a web address such as https://google.com might occupy 160 bits.

Here is a categorization of the main data types that we will be working with in this book. The corresponding Java ...

Get Java Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.